Giovanni actually sent a very similar patch to 1/4 [1] several months ago. It wasn't accepted, and I'm not sure what discussion there was about it if any (it may have been off-list). I think that this is, broadly speaking, not a *wrong* solution—and if we care about getting texel offsets working I don't mind accepting it—but I don't think it's quite the right one either. I think there are two other passes that we can do instead that would do everything this pass does and more: (1) generic vectorization of stores (2) propagation of load+store sequences into multiple store instructions Consider specifically the following shader: ```hlsl uniform float f; float4 main() : sv_target { return float4(f, 1, 2, 3); } ``` which generates: ``` 2: float | f 3: uint | 0 4: | = (<constructor-2>[@3].x @2) 5: float | 1.00000000e+00 6: uint | 1 7: | = (<constructor-2>[@6].x @5) 8: float | 2.00000000e+00 9: uint | 2 10: | = (<constructor-2>[@9].x @8) 11: float | 3.00000000e+00 12: uint | 3 13: | = (<constructor-2>[@12].x @11) 14: float4 | <constructor-2> 15: | return 16: | = (<output-sv_target0> @14) ``` Copyprop-with-constant-loads won't help this. However, with the first pass I describe, we can simplify that into: ``` 2: float | f 3: uint | 0 4: | = (<constructor-2>[@3].x @2) 5: float3 | {1.0 2.0 3.0} 6: uint | 1 7: | = (<constructor-2>[@6].xyz @5) 14: float4 | <constructor-2> 15: | return 16: | = (<output-sv_target0> @14) ``` The second pass would recognize when the rhs of a store instruction is just a load, and rewrite it as multiple store instructions. (These passes could be applied in either order, fwiw): ``` 2: float | f 3: uint | 0 4: | = (<output-sv_target0>[@3].x @2) 5: float3 | {1.0 2.0 3.0} 6: uint | 1 7: | = (<output-sv_target0>[@6].xyz @5) ``` (Ignoring the return statement for now. Also, I've automatically applied DCE to these examples.) Note that this second pass doesn't necessarily have to be limited to whole-variable stores either. (Note also that we can't do the same thing with copy-prop because a load instruction *has* to return the whole vector type; it can't be split.) This doesn't fully replace copy-prop as it is, to be sure, but I think it replaces everything that patch 1/4 does. Copy-prop is designed to replace X-store-load sequences with X, to oversimplify. If vectorization can turn X into a single instruction, then copy-prop can get rid of the store/load and probably the whole temp variable. If vectorization *can't* do that, then we needed that temp anyway. [1] https://www.winehq.org/pipermail/wine-devel/2022-May/215773.html -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/51#note_16460