[PATCH v2 0/7] MR744: vkd3d-shader/hlsl: Improve SM1 support for non-float operations, part 5.

List overview All Threads
Download

newer

older

Re: [PATCH v5 0/2] MR725:...

[PATCH v3 0/2] MR742:...

Francisco Casas (＠fcasas)

30 Mar 2024 30 Mar '24

5:23 a.m.

-- v2: vkd3d-shader/hlsl: Use LOGIC_AND instead of MUL in all(). vkd3d-shader/hlsl: Use LOGIC_OR instead of BIT_OR in any(). vkd3d-shader/ir: Add missing src swizzle in vsir_program_lower_texkills(). tests: Add failing test for clip.shader_test in SM1.

https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744

Show replies by date

Francisco Casas

30 Mar 30 Mar

5:23 a.m.

New subject: [PATCH v2 1/7] vkd3d-shader/hlsl: Reinterpret ternary condition as float in SM1.

From: Francisco Casas fcasas@codeweavers.com

Otherwise we end up with ABS and NEG on bool types. --- libs/vkd3d-shader/hlsl_codegen.c | 11 ++++- .../hlsl/arithmetic-float-uniform.shader_test | 16 +++---- tests/hlsl/float-comparison.shader_test | 4 +- tests/hlsl/fmod.shader_test | 12 ++--- tests/hlsl/inverse-trig.shader_test | 44 +++++++++---------- tests/hlsl/lit.shader_test | 12 ++--- tests/hlsl/ternary.shader_test | 18 ++++---- tests/hlsl/vertex-shader-ops.shader_test | 6 +-- 8 files changed, 65 insertions(+), 58 deletions(-)

diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 5c09ce04f..e6490265d 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2955,7 +2955,7 @@ static bool lower_logic_not(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, st static bool lower_ternary(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, struct hlsl_block *block) { struct hlsl_ir_node *operands[HLSL_MAX_OPERANDS] = { 0 }, *replacement; - struct hlsl_ir_node *zero, *cond, *first, *second; + struct hlsl_ir_node *zero, *cond, *first, *second, *float_cond; struct hlsl_constant_value zero_value = { 0 }; struct hlsl_ir_expr *expr; struct hlsl_type *type; @@ -2979,9 +2979,16 @@ static bool lower_ternary(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, stru

if (ctx->profile->major_version < 4) { + struct hlsl_type *float_type = hlsl_get_vector_type(ctx, HLSL_TYPE_FLOAT, instr->data_type->dimx); struct hlsl_ir_node *abs, *neg;

- if (!(abs = hlsl_new_unary_expr(ctx, HLSL_OP1_ABS, cond, &instr->loc))) + memset(operands, 0, sizeof(operands)); + operands[0] = cond; + if (!(float_cond = hlsl_new_expr(ctx, HLSL_OP1_REINTERPRET, operands, float_type, &instr->loc))) + return false; + hlsl_block_add_instr(block, float_cond); + + if (!(abs = hlsl_new_unary_expr(ctx, HLSL_OP1_ABS, float_cond, &instr->loc))) return false; hlsl_block_add_instr(block, abs);

diff --git a/tests/hlsl/arithmetic-float-uniform.shader_test b/tests/hlsl/arithmetic-float-uniform.shader_test index 8bc3992e7..61957f2bb 100644 --- a/tests/hlsl/arithmetic-float-uniform.shader_test +++ b/tests/hlsl/arithmetic-float-uniform.shader_test @@ -13,7 +13,7 @@ uniform 0 float4 5.0 15.0 0.0 0.0 todo(glsl) draw quad probe all rgba (20.0, -10.0, 75.0, 0.33333333) 1

-[pixel shader todo(sm<4)] +[pixel shader] uniform float2 a;

float4 main() : SV_TARGET @@ -25,10 +25,10 @@ float4 main() : SV_TARGET

[test] uniform 0 float4 5.0 15.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (5.0, 5.0, -5.0, 3.0) 1

-[pixel shader todo(sm<4)] +[pixel shader] uniform float2 a;

float4 main() : SV_TARGET @@ -40,10 +40,10 @@ float4 main() : SV_TARGET

[test] uniform 0 float4 42.0 5.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (2.0, -2.0, 2.0, -2.0) 16

-[pixel shader todo(sm<4)] +[pixel shader] uniform float2 a;

float4 main() : SV_TARGET @@ -55,10 +55,10 @@ float4 main() : SV_TARGET

[test] uniform 0 float4 45.0 5.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0)

-[pixel shader todo(sm<4)] +[pixel shader] float4 x, y;

float4 main() : sv_target @@ -69,7 +69,7 @@ float4 main() : sv_target [test] uniform 0 float4 5.0 -42.1 4.0 45.0 uniform 4 float4 15.0 -5.0 4.1 5.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (5.0, -2.1, 4.0, 0.0) 6

[require] diff --git a/tests/hlsl/float-comparison.shader_test b/tests/hlsl/float-comparison.shader_test index 84c09c129..56ce46f36 100644 --- a/tests/hlsl/float-comparison.shader_test +++ b/tests/hlsl/float-comparison.shader_test @@ -13,7 +13,7 @@ todo(glsl) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0)

-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 f;

float4 main() : sv_target @@ -55,7 +55,7 @@ float4 main() : sv_target

[test] uniform 0 float4 0.0 1.5 1.5 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad % SM1-3 apparently treats '0/0' as zero. if(sm<4) todo probe all rgba (1010101.0, 11001100.0, 1101001.0, 11.0) % SM4-5 optimises away the 'not' by inverting the condition, even though this is invalid for NaN. diff --git a/tests/hlsl/fmod.shader_test b/tests/hlsl/fmod.shader_test index ccb7b99e7..40dc66e8c 100644 --- a/tests/hlsl/fmod.shader_test +++ b/tests/hlsl/fmod.shader_test @@ -1,4 +1,4 @@ -[pixel shader todo(sm<4)] +[pixel shader] uniform float4 u;

float4 main() : sv_target @@ -8,13 +8,13 @@ float4 main() : sv_target

[test] uniform 0 float4 -0.5 6.5 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (-0.5, 0.0, 0.0, 0.0) 4 uniform 0 float4 1.1 0.3 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.2, 0.0, 0.0, 0.0) 4

-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 u;

float4 main() : sv_target @@ -24,8 +24,8 @@ float4 main() : sv_target

[test] uniform 0 float4 -0.5 6.5 2.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (-0.5, 0.5, 0.0, 0.0) 4 uniform 0 float4 1.1 0.3 3.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.1, 0.3, 0.0, 0.0) 4 diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test index 31af0ceef..62d79e9ff 100644 --- a/tests/hlsl/inverse-trig.shader_test +++ b/tests/hlsl/inverse-trig.shader_test @@ -92,7 +92,7 @@ todo(glsl) draw quad probe all rgba (31416.0, 0.0, 0.0, 0.0)

-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 a;

float4 main() : sv_target @@ -102,26 +102,26 @@ float4 main() : sv_target

[test] uniform 0 float4 -1.0 0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (-0.785409629, 0.0, 0.0, 0.0) 512

uniform 0 float4 -0.5 0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (-0.4636476, 0.0, 0.0, 0.0) 256

uniform 0 float4 0.0 0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0) 256

uniform 0 float4 0.5 0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.4636476, 0.0, 0.0, 0.0) 256

uniform 0 float4 1.0 0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.785409629, 0.0, 0.0, 0.0) 512

-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 a;

float4 main() : sv_target @@ -133,64 +133,64 @@ float4 main() : sv_target [test] % Non-degenerate cases uniform 0 float4 1.0 1.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.785385, 0.0, 0.0, 0.0) 512

uniform 0 float4 5.0 -5.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (2.356194, 0.0, 0.0, 0.0) 256

uniform 0 float4 -3.0 -3.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (-2.356194, 0.0, 0.0, 0.0) 256

uniform 0 float4 1.0 0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256

uniform 0 float4 -1.0 0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (-1.570796, 0.0, 0.0, 0.0) 256

uniform 0 float4 0.0 1.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0) 256

uniform 0 float4 0.0 -1.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256

% Degenerate cases uniform 0 float4 0.00001 0.00002 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.463647, 0.0, 0.0, 0.0) 256

uniform 0 float4 0.00001 -0.00002 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (2.677945, 0.0, 0.0, 0.0) 256

uniform 0 float4 -0.00001 100000.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (-0.000000000099986595, 0.0, 0.0, 0.0) 2048

uniform 0 float4 10000000.0 0.00000001 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256

% Negative zero behavior should be to treat it the % same as normal zero. uniform 0 float4 1000000000.0 0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256

uniform 0 float4 1000000000.0 -0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256

uniform 0 float4 0.0 -1.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256

uniform 0 float4 -0.0 -1.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256 diff --git a/tests/hlsl/lit.shader_test b/tests/hlsl/lit.shader_test index efb249dba..ce68d6ea9 100644 --- a/tests/hlsl/lit.shader_test +++ b/tests/hlsl/lit.shader_test @@ -1,4 +1,4 @@ -[pixel shader todo(sm<4)] +[pixel shader] uniform float4 u;

float4 main() : sv_target @@ -8,20 +8,20 @@ float4 main() : sv_target

[test] uniform 0 float4 -0.1 10.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 0.0, 0.0, 1.0)

[test] uniform 0 float4 1.2 -0.1 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 1.2, 0.0, 1.0)

[test] uniform 0 float4 1.2 2.0 3.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 1.2, 8.0, 1.0)

-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 u;

float4 main() : sv_target @@ -31,7 +31,7 @@ float4 main() : sv_target

[test] uniform 0 float4 1.2 2.0 3.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (2.0, 2.4, 16.0, 2.0)

[pixel shader fail] diff --git a/tests/hlsl/ternary.shader_test b/tests/hlsl/ternary.shader_test index c075b1e5a..91802afd4 100644 --- a/tests/hlsl/ternary.shader_test +++ b/tests/hlsl/ternary.shader_test @@ -3,7 +3,7 @@ shader model < 6.0

-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 x;

float4 main() : sv_target @@ -13,14 +13,14 @@ float4 main() : sv_target

[test] uniform 0 float4 2.0 3.0 4.0 5.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (2.0, 3.0, 4.0, 5.0) uniform 0 float4 0.0 10.0 11.0 12.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (-1.0, 9.0, 10.0, 11.0)

-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 x;

float4 main() : sv_target @@ -35,11 +35,11 @@ float4 main() : sv_target

[test] uniform 0 float4 1.1 3.0 4.0 5.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.1, 2.0, 0.0, 0.0)

-[pixel shader todo(sm<4)] +[pixel shader] float4 f;

float4 main() : sv_target @@ -51,7 +51,7 @@ float4 main() : sv_target

[test] uniform 0 float4 1.0 0.0 0.0 0.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.5, 0.6, 0.7, 0.0)

@@ -246,7 +246,7 @@ todo(glsl) draw quad probe all rgba (3.0, 3.0, 3.0, 3.0)

-[pixel shader todo(sm<4)] +[pixel shader]

uniform float cond; uniform float4 a, b; @@ -260,7 +260,7 @@ float4 main() : sv_target uniform 0 float4 1.0 0.0 0.0 0.0 uniform 4 float4 1.0 2.0 3.0 4.0 uniform 8 float4 5.0 6.0 7.0 8.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 2.0, 3.0, 4.0)

diff --git a/tests/hlsl/vertex-shader-ops.shader_test b/tests/hlsl/vertex-shader-ops.shader_test index ee2a72f02..ea2a3df81 100644 --- a/tests/hlsl/vertex-shader-ops.shader_test +++ b/tests/hlsl/vertex-shader-ops.shader_test @@ -88,7 +88,7 @@ probe all rgba (1.0, 1.0, 1.0, 1.0) % The ternary operator works differently in sm6. See sm6-ternary.shader_test. shader model < 6.0

-[vertex shader todo(sm<4)] +[vertex shader] int a, b, c;

void main(out float4 res : COLOR1, in float4 pos : position, out float4 out_pos : sv_position) @@ -103,11 +103,11 @@ if(sm<4) uniform 0 float 0 if(sm<4) uniform 4 float 100 if(sm<4) uniform 8 float 200 if(sm>=4) uniform 0 int4 0 100 200 0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.2, 0.2, 0.2, 0.2) if(sm<4) uniform 0 float -4 if(sm<4) uniform 4 float 100 if(sm<4) uniform 8 float 200 if(sm>=4) uniform 0 int4 -4 100 200 0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.1, 0.1, 0.1, 0.1)

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744

Francisco Casas

5:23 a.m.

New subject: [PATCH v2 2/7] tests: Report missing signature element in openGL runner.

From: Francisco Casas fcasas@codeweavers.com

--- tests/shader_runner_gl.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/tests/shader_runner_gl.c b/tests/shader_runner_gl.c index 3c2a41965..cbcfd95bb 100644 --- a/tests/shader_runner_gl.c +++ b/tests/shader_runner_gl.c @@ -1043,6 +1043,7 @@ static bool gl_runner_draw(struct shader_runner *r,

signature_element = vkd3d_shader_find_signature_element(&vs_input_signature, element->name, element->index, 0); + ok(signature_element, "Cannot find signature element %s%u.\n", element->name, element->index); attribute_idx = signature_element->register_index; format = get_format_info(element->format, false);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744

Francisco Casas (＠fcasas)

5:23 a.m.

...

Actually, more importantly, texkill _does_ operate on all four components for 2.0, and for 1.x it has very restricted usage anyway (you can't use it on arbitrary expressions). Not sure why the test is failing in that case...

You are totally right, there is a reason why the test is failing. Currently we are failing to translate texkill to spir-v correctly, it is just taking the first component into account. I missed setting up a src swizzle in vsir_program_lower_texkills().

For some reason I thought that the reason why the test was failing in native was because texkill only considered the first 3 components, but this was a ir.c problem.

I replaced the patch that introduced JUMP_TEXKILL.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744#note_66524

Francisco Casas

5:23 a.m.

New subject: [PATCH v2 3/7] vkd3d-shader/tpf: Use the extra_bits field for _nz on discard.

From: Francisco Casas fcasas@codeweavers.com

--- libs/vkd3d-shader/tpf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index 4d0658313..5c25f262b 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -5399,7 +5399,8 @@ static void write_sm4_jump(const struct tpf_writer *tpf, const struct hlsl_ir_ju

case HLSL_IR_JUMP_DISCARD_NZ: { - instr.opcode = VKD3D_SM4_OP_DISCARD | VKD3D_SM4_CONDITIONAL_NZ; + instr.opcode = VKD3D_SM4_OP_DISCARD; + instr.extra_bits = VKD3D_SM4_CONDITIONAL_NZ;

memset(&instr.srcs[0], 0, sizeof(*instr.srcs)); instr.src_count = 1;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744

Francisco Casas

5:23 a.m.

New subject: [PATCH v2 4/7] tests: Add failing test for clip.shader_test in SM1.

From: Francisco Casas fcasas@codeweavers.com

We are not properly translating texkill to spir-v since it is only considering the first component. --- tests/hlsl/clip.shader_test | 6 ++++++ 1 file changed, 6 insertions(+)

diff --git a/tests/hlsl/clip.shader_test b/tests/hlsl/clip.shader_test index 1ebc06871..64ccb6b12 100644 --- a/tests/hlsl/clip.shader_test +++ b/tests/hlsl/clip.shader_test @@ -20,3 +20,9 @@ probe all rgba (9, 8, 7, 6) uniform 0 float4 9 0 7 6 todo(glsl) draw quad probe all rgba (9, 0, 7, 6) +uniform 0 float4 3 -8 3 0 +todo(glsl) draw quad +todo(sm<4) probe all rgba (9, 0, 7, 6) +uniform 0 float4 3 3 3 -1 +todo(glsl) draw quad +todo(sm<4) probe all rgba (9, 0, 7, 6)

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744

Francisco Casas

5:23 a.m.

New subject: [PATCH v2 5/7] vkd3d-shader/ir: Add missing src swizzle in vsir_program_lower_texkills().

From: Francisco Casas fcasas@codeweavers.com

--- libs/vkd3d-shader/ir.c | 1 + tests/hlsl/clip.shader_test | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/libs/vkd3d-shader/ir.c b/libs/vkd3d-shader/ir.c index 4f0226187..3a1b58a7c 100644 --- a/libs/vkd3d-shader/ir.c +++ b/libs/vkd3d-shader/ir.c @@ -127,6 +127,7 @@ static enum vkd3d_result vsir_program_lower_texkills(struct vsir_program *progra ins->dst[0].write_mask = VKD3DSP_WRITEMASK_ALL;

ins->src[0].reg = texkill_ins->dst[0].reg; + ins->src[0].swizzle = VKD3D_SHADER_SWIZZLE(X, Y, Z, W); vsir_register_init(&ins->src[1].reg, VKD3DSPR_IMMCONST, VKD3D_DATA_FLOAT, 0); ins->src[1].reg.dimension = VSIR_DIMENSION_VEC4; ins->src[1].reg.u.immconst_f32[0] = 0.0f; diff --git a/tests/hlsl/clip.shader_test b/tests/hlsl/clip.shader_test index 64ccb6b12..4a8d223ca 100644 --- a/tests/hlsl/clip.shader_test +++ b/tests/hlsl/clip.shader_test @@ -22,7 +22,7 @@ todo(glsl) draw quad probe all rgba (9, 0, 7, 6) uniform 0 float4 3 -8 3 0 todo(glsl) draw quad -todo(sm<4) probe all rgba (9, 0, 7, 6) +probe all rgba (9, 0, 7, 6) uniform 0 float4 3 3 3 -1 todo(glsl) draw quad -todo(sm<4) probe all rgba (9, 0, 7, 6) +probe all rgba (9, 0, 7, 6)

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744

Francisco Casas

5:23 a.m.

New subject: [PATCH v2 6/7] vkd3d-shader/hlsl: Use LOGIC_OR instead of BIT_OR in any().

From: Francisco Casas fcasas@codeweavers.com

Note that BIT_OR is not available for SM1 bools, so we must prefer LOGIC_OR when possible. --- libs/vkd3d-shader/hlsl.y | 68 ++++++++++++++------------------------ tests/hlsl/any.shader_test | 20 +++++------ 2 files changed, 34 insertions(+), 54 deletions(-)

diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 52c217654..35b55fcf8 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2721,6 +2721,14 @@ static bool intrinsic_acos(struct hlsl_ctx *ctx, return write_acos_or_asin(ctx, params, loc, false); }

+/* Find the type corresponding to the given source type, with the same + * dimensions but a different base type. */ +static struct hlsl_type *convert_numeric_type(const struct hlsl_ctx *ctx, + const struct hlsl_type *type, enum hlsl_base_type base_type) +{ + return hlsl_get_numeric_type(ctx, type->class, base_type, type->dimx, type->dimy); +} + static bool intrinsic_all(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { @@ -2750,52 +2758,33 @@ static bool intrinsic_all(struct hlsl_ctx *ctx, return !!add_binary_comparison_expr(ctx, params->instrs, HLSL_OP2_NEQUAL, mul, zero, loc); }

-static bool intrinsic_any(struct hlsl_ctx *ctx, - const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +static bool intrinsic_any(struct hlsl_ctx *ctx, const struct parse_initializer *params, + const struct vkd3d_shader_location *loc) { - struct hlsl_ir_node *arg = params->args[0], *dot, *or, *zero, *bfalse, *load; + struct hlsl_ir_node *arg = params->args[0], *or, *load, *cast; + struct hlsl_type *bool_type; unsigned int i, count;

- if (arg->data_type->class != HLSL_CLASS_VECTOR && arg->data_type->class != HLSL_CLASS_SCALAR) - { - hlsl_fixme(ctx, loc, "any() implementation for non-vector, non-scalar"); - return false; - } + count = hlsl_type_component_count(arg->data_type); + bool_type = convert_numeric_type(ctx, arg->data_type, HLSL_TYPE_BOOL);

- if (arg->data_type->base_type == HLSL_TYPE_FLOAT) - { - if (!(zero = hlsl_new_float_constant(ctx, 0.0f, loc))) - return false; - hlsl_block_add_instr(params->instrs, zero); + if (!(cast = add_cast(ctx, params->instrs, arg, bool_type, loc))) + return false;

- if (!(dot = add_binary_dot_expr(ctx, params->instrs, arg, arg, loc))) - return false; + if (!(or = hlsl_add_load_component(ctx, params->instrs, cast, 0, loc))) + return false;

- return !!add_binary_comparison_expr(ctx, params->instrs, HLSL_OP2_NEQUAL, dot, zero, loc); - } - else if (arg->data_type->base_type == HLSL_TYPE_BOOL) + for (i = 1; i < count; ++i) { - if (!(bfalse = hlsl_new_bool_constant(ctx, false, loc))) + if (!(load = hlsl_add_load_component(ctx, params->instrs, cast, i, loc))) return false; - hlsl_block_add_instr(params->instrs, bfalse);

- or = bfalse; - - count = hlsl_type_component_count(arg->data_type); - for (i = 0; i < count; ++i) - { - if (!(load = hlsl_add_load_component(ctx, params->instrs, arg, i, loc))) - return false; - - if (!(or = add_binary_bitwise_expr(ctx, params->instrs, HLSL_OP2_BIT_OR, or, load, loc))) - return false; - } - - return true; + if (!(or = hlsl_new_binary_expr(ctx, HLSL_OP2_LOGIC_OR, or, load))) + return NULL; + hlsl_block_add_instr(params->instrs, or); }

- hlsl_fixme(ctx, loc, "any() implementation for non-float, non-bool"); - return false; + return true; }

static bool intrinsic_asin(struct hlsl_ctx *ctx, @@ -2896,15 +2885,6 @@ static bool intrinsic_atan2(struct hlsl_ctx *ctx, return write_atan_or_atan2(ctx, params, loc, true); }

- -/* Find the type corresponding to the given source type, with the same - * dimensions but a different base type. */ -static struct hlsl_type *convert_numeric_type(const struct hlsl_ctx *ctx, - const struct hlsl_type *type, enum hlsl_base_type base_type) -{ - return hlsl_get_numeric_type(ctx, type->class, base_type, type->dimx, type->dimy); -} - static bool intrinsic_asfloat(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { diff --git a/tests/hlsl/any.shader_test b/tests/hlsl/any.shader_test index b143dd414..8a7408286 100644 --- a/tests/hlsl/any.shader_test +++ b/tests/hlsl/any.shader_test @@ -49,7 +49,7 @@ todo(glsl) draw quad probe all rgba (1.0, 1.0, 1.0, 1.0)

-[pixel shader todo(sm<4)] +[pixel shader] uniform uint4 b;

float4 main() : sv_target @@ -60,30 +60,30 @@ float4 main() : sv_target [test] if(sm<4) uniform 0 float4 1 1 1 1 if(sm>=4) uniform 0 uint4 1 1 1 1 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 1.0, 1.0, 1.0) if(sm<4) uniform 0 float4 1 0 0 0 if(sm>=4) uniform 0 uint4 1 0 0 0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 1.0, 1.0, 1.0) if(sm<4) uniform 0 float4 0 1 0 0 if(sm>=4) uniform 0 uint4 0 1 0 0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 1.0, 1.0, 1.0) if(sm<4) uniform 0 float4 0 0 1 0 if(sm>=4) uniform 0 uint4 0 0 1 0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 1.0, 1.0, 1.0) if(sm<4) uniform 0 float4 0 0 0 1 if(sm>=4) uniform 0 uint4 0 0 0 1 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 1.0, 1.0, 1.0) if(sm<4) uniform 0 float4 0 0 0 0 if(sm>=4) uniform 0 uint4 0 0 0 0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0)

-[pixel shader todo(sm<4)] +[pixel shader] uniform uint b;

float4 main() : sv_target @@ -94,9 +94,9 @@ float4 main() : sv_target [test] if(sm<4) uniform 0 float4 1 0 0 0 if(sm>=4) uniform 0 uint4 1 0 0 0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (1.0, 1.0, 1.0, 1.0) if(sm<4) uniform 0 float4 0 0 0 0 if(sm>=4) uniform 0 uint4 0 0 0 0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0)

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744

Francisco Casas

5:23 a.m.

New subject: [PATCH v2 7/7] vkd3d-shader/hlsl: Use LOGIC_AND instead of MUL in all().

From: Francisco Casas fcasas@codeweavers.com

--- libs/vkd3d-shader/hlsl.y | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 35b55fcf8..fab585f96 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2732,30 +2732,30 @@ static struct hlsl_type *convert_numeric_type(const struct hlsl_ctx *ctx, static bool intrinsic_all(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { - struct hlsl_ir_node *arg = params->args[0], *mul, *one, *zero, *load; + struct hlsl_ir_node *arg = params->args[0], *and, *load, *cast; + struct hlsl_type *bool_type; unsigned int i, count;

- if (!(one = hlsl_new_float_constant(ctx, 1.0f, loc))) - return false; - hlsl_block_add_instr(params->instrs, one); + count = hlsl_type_component_count(arg->data_type); + bool_type = convert_numeric_type(ctx, arg->data_type, HLSL_TYPE_BOOL);

- if (!(zero = hlsl_new_float_constant(ctx, 0.0f, loc))) + if (!(cast = add_cast(ctx, params->instrs, arg, bool_type, loc))) return false; - hlsl_block_add_instr(params->instrs, zero);

- mul = one; + if (!(and = hlsl_add_load_component(ctx, params->instrs, cast, 0, loc))) + return false;

- count = hlsl_type_component_count(arg->data_type); - for (i = 0; i < count; ++i) + for (i = 1; i < count; ++i) { - if (!(load = hlsl_add_load_component(ctx, params->instrs, arg, i, loc))) + if (!(load = hlsl_add_load_component(ctx, params->instrs, cast, i, loc))) return false;

- if (!(mul = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, load, mul, loc))) - return false; + if (!(and = hlsl_new_binary_expr(ctx, HLSL_OP2_LOGIC_AND, and, load))) + return NULL; + hlsl_block_add_instr(params->instrs, and); }

- return !!add_binary_comparison_expr(ctx, params->instrs, HLSL_OP2_NEQUAL, mul, zero, loc); + return true; }

static bool intrinsic_any(struct hlsl_ctx *ctx, const struct parse_initializer *params,

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744

Zebediah Figura (＠zfigura)

1 Apr 1 Apr

7:07 p.m.

...

So, integers and bools are represented internally as float in SM1, so a reinterpret has no real effect (doesn't emit any instruction) besides avoiding the "SM1 non-float expression" fixme in d3dbc.

I think that emitting this fixme is correct for ABS and NEG operations on bool types, and thus, the reinterpret should be explicit in HLSL IR.

lower_nonfloat_exprs() should be taking care of that fixme.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744#note_66579

Francisco Casas (＠fcasas)

10:11 p.m.

On Mon Apr 1 19:07:17 2024 +0000, Zebediah Figura wrote:

...

...
So, integers and bools are represented internally as float in SM1, so

a reinterpret has no real effect (doesn't emit any instruction) besides avoiding the "SM1 non-float expression" fixme in d3dbc.

...
I think that emitting this fixme is correct for ABS and NEG operations

on bool types, and thus, the reinterpret should be explicit in HLSL IR. lower_nonfloat_exprs() should be taking care of that fixme.

Thinking about it better, I think that the problem is not that we don't have a way to write ABS and NEG operations on bool types in d3dbc.c, but that those operations should not be generated at all in well-formed HLSL IR, because they make sense for integers and floats but not bool.

It should not be a fixme, but an internal compiler error, despite of the shader model.

So to fix it, the conditional is casted to a float before performing the operations, which is good because a ternary with a float condition is still valid (and in the same function, it gets turned into a CMP anyways).

I see now that I could have used a CAST instead of a REINTERPRET, but I still think that the problem should be solved in lower_ternary() were it is introduced.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744#note_66601

Zebediah Figura (＠zfigura)

10:26 p.m.

On Mon Apr 1 22:11:28 2024 +0000, Francisco Casas wrote:

...

Thinking about it better, I think that the problem is not that we don't have a way to write ABS and NEG operations on bool types in d3dbc.c, but that those operations should not be generated at all in well-formed HLSL IR, because they make sense for integers and floats but not bool. It should not be a fixme, but an internal compiler error, despite of the shader model. So to fix it, the conditional is casted to a float before performing the operations, which is good because a ternary with a float condition is still valid (and in the same function, it gets turned into a CMP anyways). I see now that I could have used a CAST instead of a REINTERPRET, but I still think that the problem should be solved in lower_ternary() were it is introduced.

Personally I think all these sm1-specific lowering passes are not ideal. I would assert that we should just be translating one instruction into multiple instructions when generating the IR. This is simpler than writing HLSL lowering passes, it avoids the need for some basically backend-specific ops like OP3_CMP, and it reduces the amount of backend-specific logic in the core HLSL compiler, or at least moves it closer to the end.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744#note_66602

Zebediah Figura (＠zfigura)

10:30 p.m.

On Mon Apr 1 22:26:31 2024 +0000, Zebediah Figura wrote:

...

Personally I think all these sm1-specific lowering passes are not ideal. I would assert that we should just be translating one instruction into multiple instructions when generating the IR. This is simpler than writing HLSL lowering passes, it avoids the need for some basically backend-specific ops like OP3_CMP, and it reduces the amount of backend-specific logic in the core HLSL compiler, or at least moves it closer to the end.

And for things like enum hlsl_base_type, i.e. float vs int/bool, that means we explicitly handle those, probably on a per-instr/per-expr basis, in write_sm1_instructions().

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744#note_66603

Francisco Casas (＠fcasas)

2 Apr 2 Apr

1:40 p.m.

...

Personally I think all these sm1-specific lowering passes are not ideal. I would assert that we should just be translating one instruction into multiple instructions when generating the IR. This is simpler than writing HLSL lowering passes, it avoids the need for some basically backend-specific ops like OP3_CMP, and it reduces the amount of backend-specific logic in the core HLSL compiler, or at least moves it closer to the end.

...

And for things like enum hlsl_base_type, i.e. float vs int/bool, that means we explicitly handle those, probably on a per-instr/per-expr basis, in write_sm1_instructions().

I think I agree, but I don't see how this is actionable in this MR in particular.

I still have in my backlog to introduce vsir between the HLSL->d3dbc translation. I think it may be preferable to write these transformations in a d3dbc-specific vsir pass, which can gradually start absorbing those currently done in HLSL IR (looking at lower_nonfloat_exprs()), than to translate one IR instruction into multiple bytecode instructions, which requires the scaffolding for using additional temporary registers in some cases.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744#note_66679

Zebediah Figura (＠zfigura)

6:09 p.m.

...

I think I agree, but I don't see how this is actionable in this MR in particular.

Mostly just that I don't like to see more lowering code introduced.

What I'd do here is either:

* make the abs/neg conditional on the type being non-bool,

* change pass order or duplicate passes such that lower_nonfloat_exprs() happens after lower_ternary(),

And either:

* always cast to bool when creating HLSL_OP3_TERNARY (which implies getting rid of that abs/neg entirely), or

* assert that HLSL_OP3_TERNARY has an implicit cast to bool, and hence stop casting to bool when we don't need to.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744#note_66696

Zebediah Figura (＠zfigura)

7:37 p.m.

...

I still have in my backlog to introduce vsir between the HLSL->d3dbc translation. I think it may be preferable to write these transformations in a d3dbc-specific vsir pass, which can gradually start absorbing those currently done in HLSL IR (looking at lower_nonfloat_exprs()), than to translate one IR instruction into multiple bytecode instructions, which requires the scaffolding for using additional temporary registers in some cases.

Of course we want that anyway, but to be clear, it doesn't need to block getting rid of these passes either. E.g. what I want it something like the attached diff.

[scratch.diff](/uploads/13dbacdb5b8497b49f308e6d3b56c42c/scratch.diff)

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/744#note_66698

614

Age (days ago)

617

Last active (days ago)

wine-gitlab@winehq.org

15 comments

3 participants

tags (0)

participants (3)

Francisco Casas
Francisco Casas (＠fcasas)
Zebediah Figura (＠zfigura)