Re: HLSL offsetting

9 Jun 2022

      On 6/9/22 04:04, Matteo Bruni wrote:
...
...
The ugliness that we've run into is: how do we emit IR for the following
variable load?
  struct apple
  {
      int a;
      struct
      {
          Texture2D b;
          int c;
      } s;
  } a;

  /* in some expression */
  func(a.s);

Unlike the SM1 example above, the register numbers don't match up.
Separately, it's kind of ugly that backend-specific details regarding
register size and alignment are leaking into the frontend so much.
I think most of that can be hidden or contained with some proper
abstraction. And generous handwaving.
But basically, that probably could be represented in the IR as copying
around individual fields of the structure separately, rather than a
single "struct deref". Clearly it can become more complex depending on
the type of the variable but I think it should be doable.
Yeah, it could. Like I said it's not prohibitive. I'm just not sure it's 
the best option at this point.
It's worth pointing out that, at parse time, we want and need for load 
instructions (and therefore probably also store instructions) to have 
larger-than-vector types—that is, load instructions can produce structs, 
and store instructions can consume them. But we don't want that for 
SMxIR, and I believe we don't want that for the "final form" of HLSL IR 
either. That's the way the code is currently arranged and I see no 
reason not to keep it that way.
...
...
Similarly, the amount of code that has to deal with matrix majority is
unfortunate.
That personally seems more annoying. Although it's not clear to me
that handling matrix majority at a later stage is necessarily any
better.
The main idea is that we could handle it something closer to once (well, 
once per backend), at HLSL -> SMx translation.
That doesn't necessarily mean requiring that all matrix loads and stores 
are done on a single scalar—after all, we could translate a single 
vector load to multiple MOV instructions if it can't actually be 
represented by one.
It does potentially mean doing vectorization passes on SMxIR, though. 
Hard to tell this far in advance, and it's also hard to tell if that's 
something we're going to need anyway.
...
...
The former problem can potentially be solved by embedding multiple
register offsets into hlsl_deref (one per register type). Neither this
nor the latter problem are prohibitive, and I was at one point in favour
of continuing to use register offsets everywhere, but at this point my
feeling has changed, and I think using register offsets is looking more
ugly than the alternatives. I get the impression that Francisco
disagrees, though, which is why we should probably hash this out now.
As I mention below, I currently see two options as the most appealing.
This one (multiple register offsets) sits somewhat in the middle and
it feels like it would be best to go to one of the extremes instead.
It's also possible that this middle ground solution would end up being
nicer in practice. At any rate, I certainly wouldn't flat out discount
it.
...
Nor do I think we should use both register offsets and component offsets
(either in the same node type, or in different node types). That just
makes the IR way more complicated. Rather, I think we should be doing
everything in *just* component offsets until translation from HLSL IR to
SMx IR.
I touched on this earlier and I agree that the additional complexity
is unlikely to be worth it. Admittedly we're in a limbo right now
where SMxIR isn't quite there yet, which makes reasoning on some of
these details a bit fuzzy.
...
In order to deal with the problem of translating dynamic offsets from
components to registers, I see three options:
(a) emit code at runtime, or do some sophisticated lowering,
(b) use special offsetof and sizeof nodes,
(c) introduce a structured deref type, much like [1]. Francisco was
actually proposing something like this, although with an array instead
of a recursive structure, which strikes me as an improvement.
My guess is that (a) is very hard. I haven't really tried to reason it
out, though.
Given a choice between (b) and (c), I'm more inclined to pick (c). It
makes the IR structure more restrictive, and those restrictions
fundamentally match the structured nature of the language we're working
with, both things I tend to like.
After giving it some thought I think that's certainly fine *for the
higher level IR*. At the same time it seems to me that, if we go that
route, eventually we also want to have real SMxIR with register
offsets, and make sure that we can optimize constant offsets (thus
expressions) at that level.
As I see it (as of current time and date, can't guarantee that I won't
change my mind again...) we either push the backend-specific info up
(register offsets all the way) or down (component offsets with
structured deref / type info in the generic IR, transformation into
register offsets in the SMxIR). I think either option works and it's
mostly a matter of preference and which one fits / feels better with
the rest of the compiler.
Yeah, that general approach makes sense to me. And yes, of course the 
SMxIR should deal entirely in register offsets.
My current vision of SMxIR is that it should be a one-to-one 
representation of actual instructions, writable without any lowering 
passes (and hence any passes that are done on it should be optimization 
only, with the *possible* exception of RA.) In a sense, it's what we 
have already with sm4_instruction and such, except that we'd be storing 
it and doing passes on it rather than just writing it directly.
Between those two extremes—well, what we currently have basically *is* 
the first extreme, with register offsets pushed all the way up to parse 
time. It's just causing some friction that makes me think the latter 
extreme is probably going to be pretty.
...
...
Note that either way we're going to need specialized functions to
resolve deref offsets in one step. I also think that should depend on
the domain—e.g. for copy-prop we'll actually want to do everything in
component counts, but when translating to SMxIR we'll evaluate given the
register alignment constraints of the shader model. In the case of (b)
it's not going to be as simple as running the existing constant folding
pass, because we can't actually fold the sizeof/offsetof constants
(unless we dup the node list, evaluate, and then fold, which seems very
hairy and more work than the alternative).
Right, each option will have different tradeoffs WRT optimization
passes. But e.g. copy-prop should be doable even with register
offsets, we "just" need to make sure to always map the component
offsets to their respective register offsets.
Quite, in fact we're already doing it that way. But it's probably better 
to work with components, since we (a) don't waste space tracking padding 
[not very important], and (b) don't have to deal with multiple register 
sets [more important].
...
...
I invite thoughts—especially from Matteo, since we discussed this sort
of problem ages ago.
Yep, hope that my comments make sense. I want to hear from the others too.
...
ἔρρωσθε,
Zeb
[1] https://www.winehq.org/pipermail/wine-devel/2020-April/164399.html
[2] https://www.winehq.org/pipermail/wine-devel/2020-April/165493.html

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: HLSL offsetting