On 11/11/21 10:40, Matteo Bruni wrote:
On Thu, Nov 11, 2021 at 4:41 PM Giovanni Mascellani gmascellani@codeweavers.com wrote:
Hi,
On 11/11/21 14:43, Matteo Bruni wrote:
Continuing from your example: assuming a, b and c are temporaries, you split them into 4 vectors each and update the LOAD and STORE instructions to point to the specific vector. Once that is done, it becomes explicit that those groups of 4 instructions (LOAD x2, ADD, STORE) are in fact entirely independent from each other. That alone might help further transformations down the road. It's also pretty nice for register allocation, as it's easier to allocate 4 groups of 4 registers rather than a single contiguous group of 16. Sometimes you can even find out that whole rows / columns are unused and drop them altogether.
The same applies to all the complex types of course, not just matrices. There is a complication with the above in that sometimes it can be impossible to split the vars. That is, when the load / store offset is not always known at compile time. That's a bit unfortunate but it should also be pretty rare in HLSL shaders. I think it's worthwhile to optimize for the common case and accept that we won't necessarily have the best code when things align badly for us.
Ok, that is the part I was missing: it's possible that a variable cannot be split, and we want to handle that case as well (meaning that it's sensible to optimize for the common case, but we also want to be correct in _all_ cases).
Exactly. At least that's how I see it.
I think I'm missing something, or have misread part of this, but assuming that we want to go right ahead and split types into vectors where possible, I think it only makes sense to handle vectors in copyprop? If a type is bigger than that and can't be split, I think that means it necessarily has non-uniform access, which means we can't really perform copyprop on it anyway.
Or maybe you're already saying exactly that.