Re: [PATCH 0/4] MR51: vkd3d-shader/hlsl: Improve constant handling in copy prop, and extend the use of aoffimmi modifiers.

19 Nov 2022


      On Sat Nov 19 01:27:25 2022 +0000, Zebediah Figura wrote:
...
Giovanni actually sent a very similar patch to 1/4 [1] several months
ago. It wasn't accepted, and I'm not sure what discussion there was
about it if any (it may have been off-list).
I think that this is, broadly speaking, not a *wrong* solution—and if we
care about getting texel offsets working I don't mind accepting it—but I
don't think it's quite the right one either. I think there are two other
passes that we can do instead that would do everything this pass does
and more:
(1) generic vectorization of stores
(2) propagation of load+store sequences into multiple store instructions
Consider specifically the following shader:
uniform float f;
float4 main() : sv_target
{
    return float4(f, 1, 2, 3);
}

which generates:
 2:      float | f
 3:       uint | 0 
 4:            | = (<constructor-2>[@3].x @2)
 5:      float | 1.00000000e+00 
 6:       uint | 1 
 7:            | = (<constructor-2>[@6].x @5)
 8:      float | 2.00000000e+00 
 9:       uint | 2 
10:            | = (<constructor-2>[@9].x @8)
11:      float | 3.00000000e+00 
12:       uint | 3 
13:            | = (<constructor-2>[@12].x @11)
14:     float4 | <constructor-2>
15:            | return
16:            | = (<output-sv_target0> @14)

Copyprop-with-constant-loads won't help this. However, with the first
pass I describe, we can simplify that into:
 2:      float | f
 3:       uint | 0 
 4:            | = (<constructor-2>[@3].x @2)
 5:     float3 | {1.0 2.0 3.0}
 6:       uint | 1 
 7:            | = (<constructor-2>[@6].xyz @5)
14:     float4 | <constructor-2>
15:            | return
16:            | = (<output-sv_target0> @14)

The second pass would recognize when the rhs of a store instruction is
just a load, and rewrite it as multiple store instructions. (These
passes could be applied in either order, fwiw):
 2:      float | f
 3:       uint | 0 
 4:            | = (<output-sv_target0>[@3].x @2)
 5:     float3 | {1.0 2.0 3.0}
 6:       uint | 1 
 7:            | = (<output-sv_target0>[@6].xyz @5)

(Ignoring the return statement for now. Also, I've automatically applied
DCE to these examples.)
Note that this second pass doesn't necessarily have to be limited to
whole-variable stores either. (Note also that we can't do the same thing
with copy-prop because a load instruction *has* to return the whole
vector type; it can't be split.)
This doesn't fully replace copy-prop as it is, to be sure, but I think
it replaces everything that patch 1/4 does. Copy-prop is designed to
replace X-store-load sequences with X, to oversimplify. If vectorization
can turn X into a single instruction, then copy-prop can get rid of the
store/load and probably the whole temp variable. If vectorization
*can't* do that, then we needed that temp anyway.
[1] https://www.winehq.org/pipermail/wine-devel/2022-May/215773.html
I think I described this second pass poorly. It's actually probably easier to think of it as an *extension* of copy-prop.
The idea is, if you have something like that float4(f, 1, 2, 3) sequence, you can't copy-prop loads from `<constructor-2>` with the current pass because they don't all have the same instruction as a source. So what you do instead is, any time that `<constructor-2>` is itself used as an rhs to a store, you split up that store into multiple stores. You could say one per unique source but you could also do it more simply by saying one per component (and then letting vectorization clean that up later). That way you're guaranteed to be able to copy-prop them. You increase the number of stores but (hopefully) reduce the number of intermediate variables.
Actually, now that I describe it this way, I realize it doesn't even have to be limited to stores, it can be anything that's done per-component, so you could do a similar thing with some HLSL_IR_EXPRs.
-- 
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/51#note_16468

2025

2024

2023

2022

Re: [PATCH 0/4] MR51: vkd3d-shader/hlsl: Improve constant handling in copy prop, and extend the use of aoffimmi modifiers.