[PATCH v8 0/1] MR723: vkd3d-shader/spirv: Implement MAD in two operations if flagged as precise.

Conor McCarthy (＠cmccarthy)

April 11, 2024

12:52 a.m.

This would eliminate the todo for the precise mad() test in !718. Maybe we need test results on nvidia and intel to decide if we actually want this. -- v8: vkd3d-shader/ir: Implement MAD in two operations if flagged as precise. https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723

Show replies by date

Conor McCarthy

April 2024

12:52 a.m.

New subject: [PATCH v8 1/1] vkd3d-shader/ir: Implement MAD in two operations if flagged as precise.

From: Conor McCarthy <cmccarthy(a)codeweavers.com> --- libs/vkd3d-shader/ir.c | 61 +++++++++++++++++-- libs/vkd3d-shader/vkd3d_shader_private.h | 25 ++++++++ .../hlsl/arithmetic-float-uniform.shader_test | 2 +- 3 files changed, 81 insertions(+), 7 deletions(-) diff --git a/libs/vkd3d-shader/ir.c b/libs/vkd3d-shader/ir.c index 8af537390..9ba0de940 100644 --- a/libs/vkd3d-shader/ir.c +++ b/libs/vkd3d-shader/ir.c @@ -94,7 +94,48 @@ static bool vsir_instruction_init_with_params(struct vsir_program *program, return true; } -static enum vkd3d_result vsir_program_lower_texkills(struct vsir_program *program) +/* The Shader Model 5 Assembly documentation states: "If components of a mad + * instruction are tagged as precise, the hardware must execute a mad instruction + * or the exact equivalent, and it cannot split it into a multiply followed by an add." + * But DXIL.rst states the opposite: "Floating point multiply & add. This operation is + * not fused for "precise" operations." + * Windows drivers seem to conform with the latter, for SM 4-5 and SM 6. */ +static bool vsir_program_handle_precise_mad(struct vsir_program *program, size_t i, + unsigned int tmp_idx) +{ + struct vkd3d_shader_instruction_array *instructions = &program->instructions; + struct vkd3d_shader_instruction *mul_ins, *add_ins; + struct vkd3d_shader_dst_param *mul_dst; + + if (!shader_instruction_array_insert_at(instructions, i + 1, 1)) + return false; + mul_ins = &instructions->elements[i]; + add_ins = mul_ins + 1; + + mul_ins->handler_idx = VKD3DSIH_MUL; + mul_ins->src_count = 2; + + if (!(vsir_instruction_init_with_params(program, add_ins, &mul_ins->location, VKD3DSIH_ADD, 1, 2))) + return false; + add_ins->flags = mul_ins->flags & VKD3DSI_PRECISE_XYZW; + + mul_dst = mul_ins->dst; + *add_ins->dst = *mul_dst; + + mul_dst->modifiers = 0; + vsir_register_init(&mul_dst->reg, VKD3DSPR_TEMP, mul_ins->src[0].reg.data_type, 1); + mul_dst->reg.dimension = add_ins->dst->reg.dimension; + mul_dst->reg.idx[0].offset = tmp_idx; + + add_ins->src[0].reg = mul_dst->reg; + add_ins->src[0].swizzle = vsir_swizzle_from_writemask(mul_dst->write_mask); + add_ins->src[0].modifiers = 0; + add_ins->src[1] = mul_ins->src[2]; + + return true; +} + +static enum vkd3d_result vsir_program_lower_texkills_and_precise_mad(struct vsir_program *program) { struct vkd3d_shader_instruction_array *instructions = &program->instructions; struct vkd3d_shader_instruction *texkill_ins, *ins; @@ -106,15 +147,23 @@ static enum vkd3d_result vsir_program_lower_texkills(struct vsir_program *progra { texkill_ins = &instructions->elements[i]; - if (texkill_ins->handler_idx != VKD3DSIH_TEXKILL) + if (texkill_ins->handler_idx != VKD3DSIH_TEXKILL + && (texkill_ins->handler_idx != VKD3DSIH_MAD || !(texkill_ins->flags & VKD3DSI_PRECISE_XYZW))) continue; - if (!shader_instruction_array_insert_at(instructions, i + 1, components_read + 1)) - return VKD3D_ERROR_OUT_OF_MEMORY; - if (tmp_idx == ~0u) tmp_idx = program->temp_count++; + if (texkill_ins->handler_idx == VKD3DSIH_MAD) + { + if (!vsir_program_handle_precise_mad(program, i, tmp_idx)) + return VKD3D_ERROR_OUT_OF_MEMORY; + continue; + } + + if (!shader_instruction_array_insert_at(instructions, i + 1, components_read + 1)) + return VKD3D_ERROR_OUT_OF_MEMORY; + /* tmp = ins->dst[0] < 0 */ ins = &instructions->elements[i + 1]; @@ -5461,7 +5510,7 @@ enum vkd3d_result vsir_program_normalise(struct vsir_program *program, uint64_t remove_dcl_temps(program); - if ((result = vsir_program_lower_texkills(program)) < 0) + if ((result = vsir_program_lower_texkills_and_precise_mad(program)) < 0) return result; if (program->shader_version.major >= 6) diff --git a/libs/vkd3d-shader/vkd3d_shader_private.h b/libs/vkd3d-shader/vkd3d_shader_private.h index b07a7bff7..f401fd7a3 100644 --- a/libs/vkd3d-shader/vkd3d_shader_private.h +++ b/libs/vkd3d-shader/vkd3d_shader_private.h @@ -1761,6 +1761,31 @@ static inline unsigned int vkd3d_compact_swizzle(uint32_t swizzle, uint32_t writ return compacted_swizzle; } +static inline uint32_t vsir_swizzle_from_writemask(unsigned int writemask) +{ + static const unsigned int swizzles[16] = + { + 0, + VKD3D_SHADER_SWIZZLE(X, X, X, X), + VKD3D_SHADER_SWIZZLE(Y, Y, Y, Y), + VKD3D_SHADER_SWIZZLE(X, Y, X, X), + VKD3D_SHADER_SWIZZLE(Z, Z, Z, Z), + VKD3D_SHADER_SWIZZLE(X, Z, X, X), + VKD3D_SHADER_SWIZZLE(Y, Z, X, X), + VKD3D_SHADER_SWIZZLE(X, Y, Z, X), + VKD3D_SHADER_SWIZZLE(W, W, W, W), + VKD3D_SHADER_SWIZZLE(X, W, X, X), + VKD3D_SHADER_SWIZZLE(Y, W, X, X), + VKD3D_SHADER_SWIZZLE(X, Y, W, X), + VKD3D_SHADER_SWIZZLE(Z, W, X, X), + VKD3D_SHADER_SWIZZLE(X, Z, W, X), + VKD3D_SHADER_SWIZZLE(Y, Z, W, X), + VKD3D_SHADER_SWIZZLE(X, Y, Z, W), + }; + + return swizzles[writemask & 0xf]; +} + struct vkd3d_struct { enum vkd3d_shader_structure_type type; diff --git a/tests/hlsl/arithmetic-float-uniform.shader_test b/tests/hlsl/arithmetic-float-uniform.shader_test index 8bc3992e7..c6b7caaae 100644 --- a/tests/hlsl/arithmetic-float-uniform.shader_test +++ b/tests/hlsl/arithmetic-float-uniform.shader_test @@ -121,7 +121,7 @@ uniform 0 float4 1.00000007 -42.1 4.0 45.0 uniform 4 float4 1.625 -5.0 4.125 5.0 uniform 8 float4 1.00000007 -1.0 0.5 -0.5 todo(sm<6) draw quad -todo probe all rgba (2.62500048, 209.5, 17.0, 224.5) +probe all rgba (2.62500048, 209.5, 17.0, 224.5) [require] shader model >= 5.0 -- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723

Conor McCarthy (＠cmccarthy)

12:57 a.m.

On Wed Apr 10 14:22:12 2024 +0000, Conor McCarthy wrote:

...

We have no other passes which may need to insert instructions. It's a compromise between adding yet another instruction copying pass, versus favouring performance even if it's a little messy. If the former is the priority I'll add a pass for all the small operations. I added it to `lower_texkills()`. Moving instructions for each occurrence takes quadratic time, so we could see pathological cases. For example, Cyberpunk has a colossal compute shader with over 40,000 SSA values, so a shader of similar size which uses precise `MAD` could occur. If necessary we could convert `lower_texkills_and_precise_mad()` to copy instead of insert.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723#note_67573

Giovanni Mascellani (＠giomasce)

10:45 a.m.

On Thu Apr 11 00:57:31 2024 +0000, Conor McCarthy wrote:

...

I added it to `lower_texkills()`. Moving instructions for each occurrence takes quadratic time, so we could see pathological cases. For example, Cyberpunk has a colossal compute shader with over 40,000 SSA values, so a shader of similar size which uses precise `MAD` could occur. If necessary we could convert `lower_texkills_and_precise_mad()` to copy instead of insert. Yeah, I still don't like the way that pass is structured and I think it would be a better idea to rewrite it in terms of a full program copy like many other passes are structured. But that's an independent issue that might be addressed at some point: in terms of software design I think it's better to lump all these smaller changes to the shader in a single pass.

However, what do you think of refactoring your MR in a manner [similar to this](https://gitlab.winehq.org/giomasce/vkd3d/-/commits/proposal) (last three commits)? -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723#note_67586

Giovanni Mascellani (＠giomasce)

10:51 a.m.

On Thu Apr 11 10:45:37 2024 +0000, Giovanni Mascellani wrote:
> Yeah, I still don't like the way that pass is structured and I think it
> would be a better idea to rewrite it in terms of a full program copy
> like many other passes are structured. But that's an independent issue
> that might be addressed at some point: in terms of software design I
> think it's better to lump all these smaller changes to the shader in a
> single pass.
> However, what do you think of refactoring your MR in a manner [similar
> to this](https://gitlab.winehq.org/giomasce/vkd3d/-/commits/proposal)
> (last three commits)?
Sorry for the double comment, but let me add another couple of remarks:
* At some point it might happen that the instruction written by one of the helpers called by `vsir_program_lower_instruction()` should itself be processed again. To handle this we could have the switch cases return `VKD3D_OK` when they did something and `VKD3D_FALSE` when they touched nothing, so that the loop in `vsir_program_lower_instruction()` can decide whether to increment the position or not.
* Both the TEXKILL and the MAD helpers currently use a TEMP for storing the intermediate values. This should probably become a SSA at some point; even for SM4 there shouldn't be problem for passes to introduce SSAs, provided that they are written only once and the definition point dominates the usage points. The SPIR-V backend should handle them just fine. This has the advantage of not requiring to pass a pointer to all the helpers to coordinate the TEMP index, and I guess in general the less we use TEMPs the better.

-- 
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723#note_67587

Conor McCarthy (＠cmccarthy)

1:44 p.m.

On Thu Apr 11 10:51:42 2024 +0000, Giovanni Mascellani wrote:

...

Sorry for the double comment, but let me add another couple of remarks: * At some point it might happen that the instruction written by one of the helpers called by `vsir_program_lower_instruction()` should itself be processed again. To handle this we could have the switch cases return `VKD3D_OK` when they did something and `VKD3D_FALSE` when they touched nothing, so that the loop in `vsir_program_lower_instruction()` can decide whether to increment the position or not. * Both the TEXKILL and the MAD helpers currently use a TEMP for storing the intermediate values. This should probably become a SSA at some point; even for SM4 there shouldn't be problem for passes to introduce SSAs, provided that they are written only once and the definition point dominates the usage points. The SPIR-V backend should handle them just fine. This has the advantage of not requiring to pass a pointer to all the helpers to coordinate the TEMP index, and I guess in general the less we use TEMPs the better. Your branch looks good to me. Would you like to submit a new MR with those three commits, while I close mine? A couple of comments: there's a typo in the commit message for DCL_TEMPS, and I would rename `vsir_program_lower_mad` to `vsir_program_lower_precise_mad` for accuracy.

...

To handle this we could have the switch cases return `VKD3D_OK` when they did something and `VKD3D_FALSE` when they touched nothing

That makes sense if the need arises.

...

This should probably become a SSA at some point

Yes, and the coordinates and texels which the DXIL parser emits should probably be changed at the same time, though we would need a COMPOSITE_CONSTRUCT instruction to make those SSA. Maybe after the release. -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723#note_67600

Henri Verbeet (＠hverbeet)

1:52 p.m.

...

Yeah, I still don't like the way that pass is structured and I think it would be a better idea to rewrite it in terms of a full program copy like many other passes are structured.

I broadly disagree. TEXKILL isn't that common, and all this pre-emptive copying that a number of these passes do can't possibly be cheaper than an efficient implementation of shader_instruction_array_insert_at() would be. The existing implementation of shader_instruction_array_insert_at() certainly isn't optimal, but that's an issue that can be addressed there; it's not a problem with the passes using it.

...

But that's an independent issue that might be addressed at some point: in terms of software design I think it's better to lump all these smaller changes to the shader in a single pass.

However, what do you think of refactoring your MR in a manner [similar to this](https://gitlab.winehq.org/giomasce/vkd3d/-/commits/proposal) (last three commits)?

I didn't review those commits in detail, but in terms of the structure, that's pretty much exactly what I had in mind. -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723#note_67601

Giovanni Mascellani (＠giomasce)

2:43 p.m.

New subject: [PATCH v8 0/1] MR723: vkd3d-shader/spirv: Implement MAD in two operations if flagged as precise. - approved

This merge request was approved by Giovanni Mascellani. -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723

Giovanni Mascellani (＠giomasce)

2:43 p.m.

On Thu Apr 11 13:44:41 2024 +0000, Conor McCarthy wrote:

...

Your branch looks good to me. Would you like to submit a new MR with those three commits, while I close mine? A couple of comments: there's a typo in the commit message for DCL_TEMPS, and I would rename `vsir_program_lower_mad` to `vsir_program_lower_precise_mad` for accuracy.

...
To handle this we could have the switch cases return `VKD3D_OK` when they did something and `VKD3D_FALSE` when they touched nothing That makes sense if the need arises. This should probably become a SSA at some point Yes, and the coordinates and texels which the DXIL parser emits should probably be changed at the same time, though we would need a COMPOSITE_CONSTRUCT instruction to make those SSA. Maybe after the release. Ok, submitted at !779.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723#note_67612

Conor McCarthy (＠cmccarthy)

2:50 p.m.

New subject: [PATCH v8 0/1] MR723: vkd3d-shader/spirv: Implement MAD in two operations if flagged as precise. - closed

This merge request was closed by Conor McCarthy. -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/723

714

Age (days ago)

714

Last active (days ago)

List overview

9 comments

4 participants

participants (4)

Conor McCarthy
Conor McCarthy (＠cmccarthy)
Giovanni Mascellani (＠giomasce)
Henri Verbeet (＠hverbeet)