On 12/2/21 09:02, Giovanni Mascellani wrote:
Hi,
On 01/12/21 17:55, Matteo Bruni wrote:
IIRC this patch gets rid of unnecessary copies from an immediate to a temporary variable. I already forgot the details but I'm pretty sure that it did something...
I already replied privately, but for the sake of having a publicly archived reference, this patch converts this:
trace:hlsl_dump_function: 2: float | 1.00000000e+00 trace:hlsl_dump_function: 3: float | 2.00000000e+00 trace:hlsl_dump_function: 4: float | 3.00000000e+00 trace:hlsl_dump_function: 5: float | 4.00000000e+00 trace:hlsl_dump_function: 6: | = (<constructor-0>.x @2) trace:hlsl_dump_function: 7: | = (<constructor-0>.y @3) trace:hlsl_dump_function: 8: | = (<constructor-0>.z @4) trace:hlsl_dump_function: 9: | = (<constructor-0>.w @5) trace:hlsl_dump_function: 10: float4 | <constructor-0> trace:hlsl_dump_function: 11: | return
to this:
trace:hlsl_dump_function: 2: float4 | {1.00000000e+00 2.00000000e+00 3.00000000e+00 4.00000000e+00 } trace:hlsl_dump_function: 3: | return
It is not something 1/17 can (or should) be able to do. There might be a better way to do this with more generality, but this is here right now, so I think it makes sense to have it in master.
Right. And for the sake of further reference, my argument is that we should have a generic coalescing/vectorization pass, which would convert
2: 1.0 3. 1.0 4: = (var.x @2) 5: = (var.y @3)
into
2: (1.0, 1.0) 3: = (var.xy @2)
without depending on there being a following load, whereupon the existing copy-prop pass should be able to handle the rest of the optimization.
If this pass is going to be hard to write, I'm fine enough with using this patch as a temporary solution. It's not clear to me that the pass *is* hard to write, but it probably won't make it into 1.3 if that's going to be released soon. Of course, it's also not clear to me that we really *need* a temporary solution...