On Thu, Nov 11, 2021 at 12:50 PM Giovanni Mascellani gmascellani@codeweavers.com wrote:
Hi,
On 11/11/21 12:14, Matteo Bruni wrote:
Notice that variables can have more than four components. Matrices can have up to 16 and arrays even more.
Right, but we probably don't want or need to do copy propagation on those i.e. copy propagation should probably happen after matrix / struct / array splitting.
Mmh, then there is something about splitting that I'm not understanding.
My understanding so far was that variables themselves are not splitted: they are just there, and do not appear in the code as themselves. What gets splitted are the temporaries that appear when some piece of code actually does something with (say) a matrix. So, for example, if you have this fragment of code:
float4x4 a; float4x4 b; float4x4 c; c = a + b;
the compile first naively represents it as:
float4x4 a float4x4 b float4x4 c @1 = load(a) of type float4x4 @2 = load(b) of type float4x4 @3 = + (@1 @2) of type float4x4 store(c, @3)
and then this gets splitted as:
float4x4 a float4x4 b float4x4 c @1 = load(a, 0) of type float4 @2 = load(b, 0) of type float4 @3 = + (@1 @2) of type float4 store(c, 0, @3) @5 = load(a, 4) of type float4 @6 = load(b, 4) of type float4 @7 = + (@5 @6) of type float4 store(c, 4, @7) ...
That is, the variables keep their type, even though the accesses (loads and stores) to the variables have a smaller type. That's my understanding of what we want. My code mirrors this, therefore allows a variable to have more than four registers.
What is the advantage of splitting variables themselves?
Continuing from your example: assuming a, b and c are temporaries, you split them into 4 vectors each and update the LOAD and STORE instructions to point to the specific vector. Once that is done, it becomes explicit that those groups of 4 instructions (LOAD x2, ADD, STORE) are in fact entirely independent from each other. That alone might help further transformations down the road. It's also pretty nice for register allocation, as it's easier to allocate 4 groups of 4 registers rather than a single contiguous group of 16. Sometimes you can even find out that whole rows / columns are unused and drop them altogether.
The same applies to all the complex types of course, not just matrices. There is a complication with the above in that sometimes it can be impossible to split the vars. That is, when the load / store offset is not always known at compile time. That's a bit unfortunate but it should also be pretty rare in HLSL shaders. I think it's worthwhile to optimize for the common case and accept that we won't necessarily have the best code when things align badly for us.
With all that said: WRT copy propagation and this patch specifically, I think it's a good idea to only handle vector variables if it makes things easier (as it should). Notice that you don't have to bail out entirely even in the "bad" case, as a non-vector is perfectly fine as a "value". It's only when the complex variable is the destination of a store that we're in trouble.
In the specific case of my copy propagation pass, this would make things more complicated. For example, if right now I cannot reconstruct the offset of a store, I can just invalidate the whole variable. In your model, as I get it, I'd have to also invalidate other variables, that are unrelated by that point.
I don't think that's the case? A STORE is always directed to a specific temporary variable and will affect that one alone. I guess you were thinking of a model where you always split variables into vectors no matter what, in which case you're right, it quickly becomes a mess...