Re: [PATCH 0/4] MR51: vkd3d-shader/hlsl: Improve constant handling in copy prop, and extend the use of aoffimmi modifiers.

18 Nov 2022

      Giovanni actually sent a very similar patch to 1/4 [1] several months ago. It wasn't accepted, and I'm not sure what discussion there was about it if any (it may have been off-list).

I think that this is, broadly speaking, not a *wrong* solution—and if we care about getting texel offsets working I don't mind accepting it—but I don't think it's quite the right one either. I think there are two other passes that we can do instead that would do everything this pass does and more:

(1) generic vectorization of stores
(2) propagation of load+store sequences into multiple store instructions

Consider specifically the following shader:

```hlsl
uniform float f;

float4 main() : sv_target
{
    return float4(f, 1, 2, 3);
}
```

which generates:

```
 2:      float | f
 3:       uint | 0 
 4:            | = (<constructor-2>[@3].x @2)
 5:      float | 1.00000000e+00 
 6:       uint | 1 
 7:            | = (<constructor-2>[@6].x @5)
 8:      float | 2.00000000e+00 
 9:       uint | 2 
10:            | = (<constructor-2>[@9].x @8)
11:      float | 3.00000000e+00 
12:       uint | 3 
13:            | = (<constructor-2>[@12].x @11)
14:     float4 | <constructor-2>
15:            | return
16:            | = (<output-sv_target0> @14)
```

Copyprop-with-constant-loads won't help this. However, with the first pass I describe, we can simplify that into:

```
 2:      float | f
 3:       uint | 0 
 4:            | = (<constructor-2>[@3].x @2)
 5:     float3 | {1.0 2.0 3.0}
 6:       uint | 1 
 7:            | = (<constructor-2>[@6].xyz @5)
14:     float4 | <constructor-2>
15:            | return
16:            | = (<output-sv_target0> @14)
```

The second pass would recognize when the rhs of a store instruction is just a load, and rewrite it as multiple store instructions. (These passes could be applied in either order, fwiw):

```
 2:      float | f
 3:       uint | 0 
 4:            | = (<output-sv_target0>[@3].x @2)
 5:     float3 | {1.0 2.0 3.0}
 6:       uint | 1 
 7:            | = (<output-sv_target0>[@6].xyz @5)
```

(Ignoring the return statement for now. Also, I've automatically applied DCE to these examples.)

Note that this second pass doesn't necessarily have to be limited to whole-variable stores either. (Note also that we can't do the same thing with copy-prop because a load instruction *has* to return the whole vector type; it can't be split.)

This doesn't fully replace copy-prop as it is, to be sure, but I think it replaces everything that patch 1/4 does. Copy-prop is designed to replace X-store-load sequences with X, to oversimplify. If vectorization can turn X into a single instruction, then copy-prop can get rid of the store/load and probably the whole temp variable. If vectorization *can't* do that, then we needed that temp anyway.

[1] https://www.winehq.org/pipermail/wine-devel/2022-May/215773.html

-- 
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/51#note_16460

Re: [PATCH 0/4] MR51: vkd3d-shader/hlsl: Improve constant handling in copy prop, and extend the use of aoffimmi modifiers.

Zebediah Figura (＠zfigura)