[PATCH v5 0/2] MR179: vkd3d-shader/hlsl: Add support for ddx(), ddy() intrinsics.

List overview All Threads

newer

older

[PATCH v4 0/2] MR184:...

[PATCH v3 0/2] MR184: Draft:...

Ethan Lee (＠flibitijibibo)

26 Apr 2023 26 Apr '23

9:19 p.m.

SPIR-V already handled DSX/DSY, so only D3DBC/TPF needed new case blocks.

You'll notice that there's no test for this one - in addition to being a pretty straightforward translation for all possible formats, this feature uses the render target width/height and I wasn't sure if there was a good way to ensure that the test would always make sense.

Instead, I did the test manually, and it's what you'd expect (EDIT: Previously the test used a uniform which always optimized to 0, new test uses VPOS instead):

HLSL: ``` float4 main(float4 pos : sv_position) : sv_target { float4 x = ddx(pos.x); float4 y = ddy(pos.y); return x + y; } ```

D3DBC: ``` ps_3_0 dcl_position0 vPos mov r0.xyzw, vPos.xyzw mov r1.x, r0.x dsx r1.x, r1.x mov r0.x, r0.yxxx dsy r0.x, r0.x mov r1.xyzw, r1.x mov r0.xyzw, r0.x add r0.xyzw, r1.xyzw, r0.xyzw mov oC0.xyzw, r0.xyzw ```

DXBC-TPF: ``` ps_4_0 dcl_input_ps_siv linear v0.xyzw, position dcl_output o0.xyzw dcl_temps 2 mov r0.xyzw, v0.xyzw mov r1.x, r0.x dsx r1.x, r1.x mov r0.x, r0.yxxx dsy r0.x, r0.x mov r1.xyzw, r1.x mov r0.xyzw, r0.x add r0.xyzw, r1.xyzw, r0.xyzw mov o0.xyzw, r0.xyzw ret ```

Fixes https://bugs.winehq.org/show_bug.cgi?id=54827

-- v5: tests: Add test for ddx(), ddy() intrinsics vkd3d-shader/hlsl: Add support for ddx(), ddy() intrinsics.

https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179

Show replies by date

Ethan Lee

26 Apr 26 Apr

9:19 p.m.

New subject: [PATCH v5 1/2] vkd3d-shader/hlsl: Add support for ddx(), ddy() intrinsics.

From: Ethan Lee flibitijibibo@gmail.com

SPIR-V already handled DSX/DSY, so only D3DBC/TPF needed new case blocks.

Signed-off-by: Ethan Lee flibitijibibo@gmail.com --- libs/vkd3d-shader/d3dbc.c | 8 ++++++++ libs/vkd3d-shader/hlsl.y | 24 ++++++++++++++++++++++++ libs/vkd3d-shader/tpf.c | 10 ++++++++++ 3 files changed, 42 insertions(+)

diff --git a/libs/vkd3d-shader/d3dbc.c b/libs/vkd3d-shader/d3dbc.c index 14268440..aa45dc2d 100644 --- a/libs/vkd3d-shader/d3dbc.c +++ b/libs/vkd3d-shader/d3dbc.c @@ -1696,6 +1696,14 @@ static void write_sm1_expr(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *b write_sm1_unary_op(ctx, buffer, D3DSIO_ABS, &instr->reg, &arg1->reg, 0, 0); break;

+ case HLSL_OP1_DSX: + write_sm1_unary_op(ctx, buffer, D3DSIO_DSX, &instr->reg, &arg1->reg, 0, 0); + break; + + case HLSL_OP1_DSY: + write_sm1_unary_op(ctx, buffer, D3DSIO_DSY, &instr->reg, &arg1->reg, 0, 0); + break; + case HLSL_OP1_EXP2: write_sm1_per_component_unary_op(ctx, buffer, instr, D3DSIO_EXP); break; diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 10deda75..6c3bca34 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2648,6 +2648,28 @@ static bool intrinsic_cross(struct hlsl_ctx *ctx, return !!add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_ADD, mul2, mul1_neg, loc); }

+static bool intrinsic_ddx(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_node *arg; + + if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[0], loc))) + return false; + + return !!add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_DSX, arg, loc); +} + +static bool intrinsic_ddy(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_node *arg; + + if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[0], loc))) + return false; + + return !!add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_DSY, arg, loc); +} + static bool intrinsic_distance(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { @@ -3380,6 +3402,8 @@ intrinsic_functions[] = {"clamp", 3, true, intrinsic_clamp}, {"cos", 1, true, intrinsic_cos}, {"cross", 2, true, intrinsic_cross}, + {"ddx", 1, true, intrinsic_ddx}, + {"ddy", 1, true, intrinsic_ddy}, {"distance", 2, true, intrinsic_distance}, {"dot", 2, true, intrinsic_dot}, {"exp", 1, true, intrinsic_exp}, diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index d104c078..0908fba3 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -3948,6 +3948,16 @@ static void write_sm4_expr(struct hlsl_ctx *ctx, write_sm4_unary_op_with_two_destinations(buffer, VKD3D_SM4_OP_SINCOS, &expr->node, 1, arg1); break;

+ case HLSL_OP1_DSX: + assert(type_is_float(dst_type)); + write_sm4_unary_op(buffer, VKD3D_SM4_OP_DERIV_RTX, &expr->node, arg1, 0); + break; + + case HLSL_OP1_DSY: + assert(type_is_float(dst_type)); + write_sm4_unary_op(buffer, VKD3D_SM4_OP_DERIV_RTY, &expr->node, arg1, 0); + break; + case HLSL_OP1_EXP2: assert(type_is_float(dst_type)); write_sm4_unary_op(buffer, VKD3D_SM4_OP_EXP, &expr->node, arg1, 0);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179

Ethan Lee

9:19 p.m.

New subject: [PATCH v5 2/2] tests: Add test for ddx(), ddy() intrinsics

From: Ethan Lee flibitijibibo@gmail.com

Signed-off-by: Ethan Lee flibitijibibo@gmail.com --- Makefile.am | 1 + tests/ddxddy.shader_test | 9 +++++++++ 2 files changed, 10 insertions(+) create mode 100644 tests/ddxddy.shader_test

diff --git a/Makefile.am b/Makefile.am index 3f1ec171..8e94658d 100644 --- a/Makefile.am +++ b/Makefile.am @@ -63,6 +63,7 @@ vkd3d_shader_tests = \ tests/cbuffer.shader_test \ tests/compute.shader_test \ tests/conditional.shader_test \ + tests/ddxddy.shader_test \ tests/distance.shader_test \ tests/entry-point-semantics.shader_test \ tests/exp.shader_test \ diff --git a/tests/ddxddy.shader_test b/tests/ddxddy.shader_test new file mode 100644 index 00000000..3dd808b3 --- /dev/null +++ b/tests/ddxddy.shader_test @@ -0,0 +1,9 @@ +[pixel shader] +float4 main(float4 pos : sv_position) : sv_target +{ + return float4(ddx(pos.x), ddy(pos.y), 0, 0); +} + +[test] +draw quad +probe all rgba (1.0, 1.0, 0.0, 0.0)

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179

Giovanni Mascellani (＠giomasce)

27 Apr 27 Apr

9:40 a.m.

On Wed Apr 26 16:02:44 2023 +0000, Ethan Lee wrote:

...

Latest push splits the results of ddx/ddy to be separate. In trying to come up with a test that produces more interesting results it seems like it just becomes a test for other functions like sin/cos, since the result with vpos is always going to be 1... this is what I get for not paying attention in math class :P

Mostly for fun, here is a somewhat more fancy test: ``` [pixel shader] float4 main(float4 pos : sv_position) : sv_target { pos /= 10.0; float nonlinear = pos.x * pos.y - pos.x * (pos.x + 0.5); return float4(nonlinear, ddx(nonlinear), ddy(nonlinear), 0.0); }

[test] draw quad probe (10, 10) rgba (-0.524999976, -0.164999843, 0.104999900, 0.0) 8 probe (11, 10) rgba (-0.689999819, -0.164999843, 0.104999900, 0.0) 8 probe (10, 11) rgba (-0.420000076, -0.164999843, 0.104999900, 0.0) 8 probe (11, 11) rgba (-0.574999928, -0.164999843, 0.104999900, 0.0) 8 probe (12, 10) rgba (-0.874999881, -0.205000162, 0.124999881, 0.0) 8 probe (150, 150) rgba (-7.52500916, -1.56500244, 1.50500488, 0.0) 8 ```

An interesting feature is that the partial derivatives are constant across the whole 2x2 tile for all the three drivers I tested (WARP on Windows, RADV on Linux and NVIDIA proprietary on Linux), a non-trivial fact that I would not have given for granted before testing.

I also did some tests with trigonometric functions, but NVIDIA's implementation is so wonky that it's nearly impossible to avoid diverging from the other drivers.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179#note_31360

Henri Verbeet (＠hverbeet)

9:46 a.m.

I wouldn't want to dump a pile of work on someone, but note that HLSL has ddx_coarse()/ddy_coarse() and ddx_fine()/ddy_fine() variants of ddx()/ddy() as well. It seems straightforward enough to implement those as well as a follow-up to this MR.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179#note_31363

Henri Verbeet (＠hverbeet)

11:14 a.m.

In any case, this seems fine. It would be nice if Giovanni's test could be integrated before this gets committed, but adding that as a follow-up is fine too. I do think we want that test.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179#note_31375

Henri Verbeet (＠hverbeet)

11:15 a.m.

New subject: [PATCH v5 0/2] MR179: vkd3d-shader/hlsl: Add support for ddx(), ddy() intrinsics. - approved

This merge request was approved by Henri Verbeet.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179

Ethan Lee (＠flibitijibibo)

2:22 p.m.

On Thu Apr 27 09:40:48 2023 +0000, Giovanni Mascellani wrote:

...

Mostly for fun, here is a somewhat more fancy test:
[pixel shader]
float4 main(float4 pos : sv_position) : sv_target
{
    pos /= 10.0;
    float nonlinear = pos.x * pos.y - pos.x * (pos.x + 0.5);
    return float4(nonlinear, ddx(nonlinear), ddy(nonlinear), 0.0);
}
[test]
draw quad
probe (10, 10) rgba (-0.524999976, -0.164999843, 0.104999900, 0.0) 8
probe (11, 10) rgba (-0.689999819, -0.164999843, 0.104999900, 0.0) 8
probe (10, 11) rgba (-0.420000076, -0.164999843, 0.104999900, 0.0) 8
probe (11, 11) rgba (-0.574999928, -0.164999843, 0.104999900, 0.0) 8
probe (12, 10) rgba (-0.874999881, -0.205000162, 0.124999881, 0.0) 8
probe (150, 150) rgba (-7.52500916, -1.56500244, 1.50500488, 0.0) 8
An interesting feature is that the partial derivatives are constant across the whole 2x2 tile for all the three drivers I tested (WARP on Windows, RADV on Linux and NVIDIA proprietary on Linux), a non-trivial fact that I would not have given for granted before testing. I also did some tests with trigonometric functions, but NVIDIA's implementation is so wonky that it's nearly impossible to avoid diverging from the other drivers.

Oddly enough this doesn't seem to pass on RADV with a 7900XTX, but just barely:

``` shader_runner:1197: Running tests from a Unix build shader_runner:1199: Compiling shaders with vkd3d-shader and executing with Vulkan shader_runner:601: Section [test], line 19: Test failed: Got {-7.52500343e+00, -1.56500721e+00, 1.50500488e+00, 0.00000000e+00}, expected {-7.52500916e+00, -1.56500244e+00, 1.50500488e+00, 0.00000000e+00} at (150, 150). shader_runner:1202: Compiling shaders with vkd3d-shader and executing with vkd3d shader_runner:612: Driver name: radv, driver info: Mesa 23.2.0-devel. shader_runner:601: Section [test], line 19: Test failed: Got {-7.52500343e+00, -1.56500721e+00, 1.50500488e+00, 0.00000000e+00}, expected {-7.52500916e+00, -1.56500244e+00, 1.50500488e+00, 0.00000000e+00} at (150, 150). shader_runner: 149 tests executed (2 failures, 0 skipped, 0 todo, 0 bugs). FAIL tests/ddxddy.shader_test (exit status: 1) ```

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179#note_31396

Giovanni Mascellani (＠giomasce)

28 Apr 28 Apr

11:14 a.m.

On Thu Apr 27 14:22:18 2023 +0000, Ethan Lee wrote:

...

Oddly enough this doesn't seem to pass on RADV with a 7900XTX, but just barely:

shader_runner:1197: Running tests from a Unix build
shader_runner:1199: Compiling shaders with vkd3d-shader and executing
with Vulkan
shader_runner:601: Section [test], line 19: Test failed: Got
{-7.52500343e+00, -1.56500721e+00, 1.50500488e+00, 0.00000000e+00},
expected {-7.52500916e+00, -1.56500244e+00, 1.50500488e+00,
0.00000000e+00} at (150, 150).
shader_runner:1202: Compiling shaders with vkd3d-shader and executing
with vkd3d
shader_runner:612: Driver name: radv, driver info: Mesa 23.2.0-devel.
shader_runner:601: Section [test], line 19: Test failed: Got
{-7.52500343e+00, -1.56500721e+00, 1.50500488e+00, 0.00000000e+00},
expected {-7.52500916e+00, -1.56500244e+00, 1.50500488e+00,
0.00000000e+00} at (150, 150).
shader_runner: 149 tests executed (2 failures, 0 skipped, 0 todo, 0 bugs).
FAIL tests/ddxddy.shader_test (exit status: 1)

I would have hoped that 8 ULPs were enough for having a common idea of the first derivatives of a quadratic polynomial, but it seems video card producers decided differently. I don't think there is much we can do about, except for raising our tolerance (which is the number at the end of the `probe` command).

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179#note_31492

Ethan Lee (＠flibitijibibo)

2:05 p.m.

On Fri Apr 28 11:14:47 2023 +0000, Giovanni Mascellani wrote:

...

I would have hoped that 8 ULPs were enough for having a common idea of the first derivatives of a quadratic polynomial, but it seems video card producers decided differently. I don't think there is much we can do about, except for raising our tolerance (which is the number at the end of the `probe` command).

Messed with the tolerance values a bit, all of the tests are actually okay at 8 except for the last one, which could go no lower than _40_. That seems... really high for one pixel? If it isn't, I can push this test with that one tolerance value set higher than the others.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179#note_31503

Giovanni Mascellani (＠giomasce)

2:17 p.m.

On Fri Apr 28 14:05:56 2023 +0000, Ethan Lee wrote:

...

Messed with the tolerance values a bit, all of the tests are actually okay at 8 except for the last one, which could go no lower than _40_. That seems... really high for one pixel? If it isn't, I can push this test with that one tolerance value set higher than the others.

Strange, not sure why it goes that astray, but I wouldn't sweat too much over that. Pushing 40 is fine.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179#note_31504

914

Age (days ago)

916

Last active (days ago)

wine-gitlab@winehq.org

10 comments

4 participants

tags (0)

participants (4)

Ethan Lee
Ethan Lee (＠flibitijibibo)
Giovanni Mascellani (＠giomasce)
Henri Verbeet (＠hverbeet)