Re: [PATCH v2 0/4] MR168: hlsl: ps_1_* outputs.

3 May 2023


      The MR is ok for me, except for the little remark about comment wording.
I just wanted to notice that the new algorithm has higher computational complexity than before, because `get_available_writemask()` used to be constant time and it's now linear in the number of register allocations. This already causes a measurable performance hit in a synthetic but still relatively simple shader as this:
```
uniform float4x4 x;
uniform float4x4 y;
float4 main(float4 pos : sv_position) : sv_target
{
    float4x4 a = mul(mul(y, x), mul(x, x));
    float4x4 b = mul(mul(y, x), mul(y, x));
    float4x4 c = mul(mul(y, y), mul(x, x));
    float4x4 d = mul(mul(y, y), mul(y, x));
float4 ret = 0.0;
    ret += a[0] - b[0] * c[0] / d[0];
    ret += a[1] - b[1] * c[1] / d[1];
    ret += a[2] - b[2] * c[2] / d[2];
    ret += a[3] - b[3] * c[3] / d[3];
return ret;
}
```
Here I am leveraging `mul()` to create a lot of temporaries and summing everything to prevent DCE from optimizing too much. On my computer a shader runner that just compiles this (doesn't execute it) takes 0.1 seconds before this MR and 0.11 seconds after it.
I don't claim any significance for my random microbenchmark experiment, so I don't think it's necessary to change the MR, but when and if we'll be harvesting for performances in the HLSL compiler let's remember to have a look here.
-- 
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/168#note_31864

2025

2024

2023

2022

Re: [PATCH v2 0/4] MR168: hlsl: ps_1_* outputs.