Re: [PATCH v2 0/4] MR168: hlsl: ps_1_* outputs.

3 May 2023


      On Wed May  3 16:08:53 2023 +0000, Giovanni Mascellani wrote:
...
The MR is ok for me, except for the little remark about comment wording.
I just wanted to notice that the new algorithm has higher computational
complexity than before, because `get_available_writemask()` used to be
constant time and it's now linear in the number of register allocations.
This already causes a measurable performance hit in a synthetic but
still relatively simple shader as this:
uniform float4x4 x;
uniform float4x4 y;
float4 main(float4 pos : sv_position) : sv_target
{
    float4x4 a = mul(mul(y, x), mul(x, x));
    float4x4 b = mul(mul(y, x), mul(y, x));
    float4x4 c = mul(mul(y, y), mul(x, x));
    float4x4 d = mul(mul(y, y), mul(y, x));
    float4 ret = 0.0;
    ret += a[0] - b[0] * c[0] / d[0];
    ret += a[1] - b[1] * c[1] / d[1];
    ret += a[2] - b[2] * c[2] / d[2];
    ret += a[3] - b[3] * c[3] / d[3];
    return ret;
}

Here I am leveraging `mul()` to create a lot of temporaries and summing
everything to prevent DCE from optimizing too much. On my computer a
shader runner that just compiles this (doesn't execute it) takes 0.1
seconds before this MR and 0.11 seconds after it.
I don't claim any significance for my random microbenchmark experiment,
so I don't think it's necessary to change the MR, but when and if we'll
be harvesting for performances in the HLSL compiler let's remember to
have a look here.
We could probably do better by just recording allocations for the few cases where we need to reserve, and then using the old pass for everything else. But it's probably not worth rewriting this again until we see evidence it matters.
-- 
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/168#note_31892

2025

2024

2023

2022

Re: [PATCH v2 0/4] MR168: hlsl: ps_1_* outputs.