The following thread is based partly on, and makes reference to, private conversation, but for the sake of openness I've elected to post it to wine-devel.
A long time ago, HLSL_IR_LOAD—then called HLSL_IR_DEREF—was this:
enum hlsl_ir_deref_type { HLSL_IR_DEREF_VAR, HLSL_IR_DEREF_ARRAY, HLSL_IR_DEREF_RECORD, };
struct hlsl_deref { enum hlsl_ir_deref_type type; union { struct hlsl_ir_var *var; struct { struct hlsl_ir_node *array; struct hlsl_ir_node *index; } array; struct { struct hlsl_ir_node *record; struct hlsl_struct_field *field; } record; } v; };
struct hlsl_ir_deref { struct hlsl_ir_node node; struct hlsl_deref src; };
Now, one problem with this is that it was kind of mean to RA and liveness analysis. For example, a line of HLSL like
var.a.b = 2.0;
produced the following IR:
2: 2.0 3: deref(var) 4: @3.b 5: @4.c = @2
This is annoying because:
* to discover that "var" is written, @5 needs to reach upwards through a deref chain;
* reaching through the deref chain requires lots of assert() statements;
* @3 implies that "var" is read, which it isn't (and, if we reach upwards through the deref chain, @4 implies the same thing).
I proposed that instead of using generic node pointers, we could have arbitrarily long deref chains encoded in the hlsl_deref structure itself. [1]
There was some discussion on that—which is mostly concentrated in that thread, and also IRC. Most of the concern is about being nicer to liveness analysis and RA.
What ultimately ended up happening is that Matteo proposed numeric (register) offsets calculated at parse time, which is fundamentally similar to my idea except that it's a lot simpler to work with.
Interestingly, the problem of multiple register sets was brought up [2]:
From my testing it essentially does, yes, i.e. if you have
struct { int unused; float f; bool b; } s; float4 main(float4 pos : POSITION) : POSITION { if (s.b) return s.f; return pos; }
then "s" gets allocated to registers b0-b2 and c0-c1, but only b2 and c1 are ever used.
So yeah, it makes things pretty simple. I can see how it would have been a lot uglier otherwise.
I guess we've finally run into that ugliness now :-(
The ultimate conclusions to draw from this historical exercise are:
- what I said about "we used to have derefs handled like that" is mostly correct, although not quite. We did used to have more rich type information, and we did decide that offsets calculated at parse time were preferable to that type information, although I thought we at one point had something like [1] in the tree, which we didn't. Anyway the decision to use offsets calculated at parse time seems to have been motivated only by simplicity. To be fair, at the time, it *was* simpler.
- [1] and the later patch that replaced it were mostly motivated by RA. We will probably end up doing RA after SMxIR translation, but we may very likely do RA *before* it as well (tracking e.g. SMx instructions with register numbers instead of having def-use chains.) A more salient concern is that I still don't like the idea of having instructions in the tree that aren't actually translated (or translatable) to SMxIR, which means that we shouldn't have instructions that yield e.g. structs.
The ugliness that we've run into is: how do we emit IR for the following variable load?
struct apple { int a; struct { Texture2D b; int c; } s; } a;
/* in some expression */ func(a.s);
Unlike the SM1 example above, the register numbers don't match up. Separately, it's kind of ugly that backend-specific details regarding register size and alignment are leaking into the frontend so much. Similarly, the amount of code that has to deal with matrix majority is unfortunate.
The former problem can potentially be solved by embedding multiple register offsets into hlsl_deref (one per register type). Neither this nor the latter problem are prohibitive, and I was at one point in favour of continuing to use register offsets everywhere, but at this point my feeling has changed, and I think using register offsets is looking more ugly than the alternatives. I get the impression that Francisco disagrees, though, which is why we should probably hash this out now.
Nor do I think we should use both register offsets and component offsets (either in the same node type, or in different node types). That just makes the IR way more complicated. Rather, I think we should be doing everything in *just* component offsets until translation from HLSL IR to SMx IR.
In order to deal with the problem of translating dynamic offsets from components to registers, I see three options:
(a) emit code at runtime, or do some sophisticated lowering,
(b) use special offsetof and sizeof nodes,
(c) introduce a structured deref type, much like [1]. Francisco was actually proposing something like this, although with an array instead of a recursive structure, which strikes me as an improvement.
My guess is that (a) is very hard. I haven't really tried to reason it out, though.
Given a choice between (b) and (c), I'm more inclined to pick (c). It makes the IR structure more restrictive, and those restrictions fundamentally match the structured nature of the language we're working with, both things I tend to like.
Note that either way we're going to need specialized functions to resolve deref offsets in one step. I also think that should depend on the domain—e.g. for copy-prop we'll actually want to do everything in component counts, but when translating to SMxIR we'll evaluate given the register alignment constraints of the shader model. In the case of (b) it's not going to be as simple as running the existing constant folding pass, because we can't actually fold the sizeof/offsetof constants (unless we dup the node list, evaluate, and then fold, which seems very hairy and more work than the alternative).
I invite thoughts—especially from Matteo, since we discussed this sort of problem ages ago.
ἔρρωσθε, Zeb
[1] https://www.winehq.org/pipermail/wine-devel/2020-April/164399.html
[2] https://www.winehq.org/pipermail/wine-devel/2020-April/165493.html