Giovanni actually sent a very similar patch to 1/4 [1] several months ago. It wasn't accepted, and I'm not sure what discussion there was about it if any (it may have been off-list).
I think that this is, broadly speaking, not a *wrong* solution—and if we care about getting texel offsets working I don't mind accepting it—but I don't think it's quite the right one either. I think there are two other passes that we can do instead that would do everything this pass does and more:
(1) generic vectorization of stores (2) propagation of load+store sequences into multiple store instructions
Consider specifically the following shader:
```hlsl uniform float f;
float4 main() : sv_target { return float4(f, 1, 2, 3); } ```
which generates:
``` 2: float | f 3: uint | 0 4: | = (<constructor-2>[@3].x @2) 5: float | 1.00000000e+00 6: uint | 1 7: | = (<constructor-2>[@6].x @5) 8: float | 2.00000000e+00 9: uint | 2 10: | = (<constructor-2>[@9].x @8) 11: float | 3.00000000e+00 12: uint | 3 13: | = (<constructor-2>[@12].x @11) 14: float4 | <constructor-2> 15: | return 16: | = (<output-sv_target0> @14) ```
Copyprop-with-constant-loads won't help this. However, with the first pass I describe, we can simplify that into:
``` 2: float | f 3: uint | 0 4: | = (<constructor-2>[@3].x @2) 5: float3 | {1.0 2.0 3.0} 6: uint | 1 7: | = (<constructor-2>[@6].xyz @5) 14: float4 | <constructor-2> 15: | return 16: | = (<output-sv_target0> @14) ```
The second pass would recognize when the rhs of a store instruction is just a load, and rewrite it as multiple store instructions. (These passes could be applied in either order, fwiw):
``` 2: float | f 3: uint | 0 4: | = (<output-sv_target0>[@3].x @2) 5: float3 | {1.0 2.0 3.0} 6: uint | 1 7: | = (<output-sv_target0>[@6].xyz @5) ```
(Ignoring the return statement for now. Also, I've automatically applied DCE to these examples.)
Note that this second pass doesn't necessarily have to be limited to whole-variable stores either. (Note also that we can't do the same thing with copy-prop because a load instruction *has* to return the whole vector type; it can't be split.)
This doesn't fully replace copy-prop as it is, to be sure, but I think it replaces everything that patch 1/4 does. Copy-prop is designed to replace X-store-load sequences with X, to oversimplify. If vectorization can turn X into a single instruction, then copy-prop can get rid of the store/load and probably the whole temp variable. If vectorization *can't* do that, then we needed that temp anyway.
[1] https://www.winehq.org/pipermail/wine-devel/2022-May/215773.html