Re: [PATCH vkd3d 2/5] vkd3d-shader/hlsl: Perform a copy propagation pass.

11 Nov 2021

      On Thu, Nov 11, 2021 at 12:50 PM Giovanni Mascellani
gmascellani@codeweavers.com wrote:
...
Hi,
On 11/11/21 12:14, Matteo Bruni wrote:
...
...
Notice that variables can have more than four components. Matrices can
have up to 16 and arrays even more.
Right, but we probably don't want or need to do copy propagation on
those i.e. copy propagation should probably happen after matrix /
struct / array splitting.
Mmh, then there is something about splitting that I'm not understanding.
My understanding so far was that variables themselves are not splitted:
they are just there, and do not appear in the code as themselves.  What
gets splitted are the temporaries that appear when some piece of code
actually does something with (say) a matrix. So, for example, if you
have this fragment of code:

float4x4 a;
float4x4 b;
float4x4 c;
c = a + b;

the compile first naively represents it as:

float4x4 a
float4x4 b
float4x4 c
@1 = load(a) of type float4x4
@2 = load(b) of type float4x4
@3 = + (@1 @2) of type float4x4
store(c, @3)

and then this gets splitted as:

float4x4 a
float4x4 b
float4x4 c
@1 = load(a, 0) of type float4
@2 = load(b, 0) of type float4
@3 = + (@1 @2) of type float4
store(c, 0, @3)
@5 = load(a, 4) of type float4
@6 = load(b, 4) of type float4
@7 = + (@5 @6) of type float4
store(c, 4, @7)
...

That is, the variables keep their type, even though the accesses (loads
and stores) to the variables have a smaller type. That's my
understanding of what we want. My code mirrors this, therefore allows a
variable to have more than four registers.
What is the advantage of splitting variables themselves?
Continuing from your example: assuming a, b and c are temporaries, you
split them into 4 vectors each and update the LOAD and STORE
instructions to point to the specific vector. Once that is done, it
becomes explicit that those groups of 4 instructions (LOAD x2, ADD,
STORE) are in fact entirely independent from each other. That alone
might help further transformations down the road.
It's also pretty nice for register allocation, as it's easier to
allocate 4 groups of 4 registers rather than a single contiguous group
of 16. Sometimes you can even find out that whole rows / columns are
unused and drop them altogether.
The same applies to all the complex types of course, not just
matrices. There is a complication with the above in that sometimes it
can be impossible to split the vars. That is, when the load / store
offset is not always known at compile time. That's a bit unfortunate
but it should also be pretty rare in HLSL shaders. I think it's
worthwhile to optimize for the common case and accept that we won't
necessarily have the best code when things align badly for us.
With all that said: WRT copy propagation and this patch specifically,
I think it's a good idea to only handle vector variables if it makes
things easier (as it should). Notice that you don't have to bail out
entirely even in the "bad" case, as a non-vector is perfectly fine as
a "value". It's only when the complex variable is the destination of a
store that we're in trouble.
...
In the specific case of my copy propagation pass, this would make things
more complicated. For example, if right now I cannot reconstruct the
offset of a store, I can just invalidate the whole variable. In your
model, as I get it, I'd have to also invalidate other variables, that
are unrelated by that point.
I don't think that's the case? A STORE is always directed to a
specific temporary variable and will affect that one alone. I guess
you were thinking of a model where you always split variables into
vectors no matter what, in which case you're right, it quickly becomes
a mess...

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH vkd3d 2/5] vkd3d-shader/hlsl: Perform a copy propagation pass.