With tests from !364, separated out from the HLSL changes there and updated. This MR can wait until 364 is upstream though.
It is apparently unnecessary to match the SM4/5 implementation, since the AMD Windows results differ. The RADV results are a bit wrong, but Proton uses the SPIR-V GLSL extension instructions too, and no workarounds have been implemented there.
-- v2: vkd3d-shader/spirv: Handle the ACOS, ASIN and ATAN instructions in spirv_compiler_emit_ext_glsl_instruction(). vkd3d-shader/dxil: Handle inverse trigonometric functions in sm6_parser_emit_dx_unary(). tests/shader-runner: Add tests for atan and atan2 trig intrinsics. tests/shader-runner: Add tests for acos and asin trig intrinsics.
From: Petrichor Park ppark@codeweavers.com
Extracted by Conor McCarthy from an HLSL patch, and modified to include SM 6 variations. --- Makefile.am | 1 + tests/hlsl/inverse-trig.shader_test | 94 +++++++++++++++++++++++++++++ 2 files changed, 95 insertions(+) create mode 100644 tests/hlsl/inverse-trig.shader_test
diff --git a/Makefile.am b/Makefile.am index 90e7dcfcc..1687d9a05 100644 --- a/Makefile.am +++ b/Makefile.am @@ -121,6 +121,7 @@ vkd3d_shader_tests = \ tests/hlsl/initializer-struct.shader_test \ tests/hlsl/intrinsic-override.shader_test \ tests/hlsl/invalid.shader_test \ + tests/hlsl/inverse-trig.shader_test \ tests/hlsl/is-front-face.shader_test \ tests/hlsl/ldexp.shader_test \ tests/hlsl/length.shader_test \ diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test new file mode 100644 index 000000000..56bdb4cbe --- /dev/null +++ b/tests/hlsl/inverse-trig.shader_test @@ -0,0 +1,94 @@ +% TPF does not define instructions for inverse trig; these intrinsics are +% decomposed into other instructions. FXC emits code which may vary wrt other +% implementations. DXIL defines intrinsics for inverse trig, to be implemented +% by the backend. + +[pixel shader todo] +uniform float4 a; + +float4 main() : sv_target +{ + return float4(acos(a.x), 0.0, 0.0, 0.0); +} + +[test] +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (3.14159274, 0.0, 0.0, 0.0) 128 + +uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (2.094441441, 0.0, 0.0, 0.0) 128 + +uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.57072878, 0.0, 0.0, 0.0) 1024 + +uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.04715133, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) 128 + +[pixel shader todo] +uniform float4 a; + +float4 main() : sv_target +{ + return float4(asin(a.x), 0.0, 0.0, 0.0); +} + +[test] +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (-1.57079637, 0.0, 0.0, 0.0) 128 + +[require] +shader model < 6.0 + +[test] +uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (-0.523645043, 0.0, 0.0, 0.0) 128 + +% Because sqrt isn't identical across platforms, there is some inaccuracy +% here even with an identical algorithm, and because it's so near zero, +% each ulp is really small. So, in order to pass there needs to be this +% enormous margin. +uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.0000675916672, 0.0, 0.0, 0.0) 131072 + +uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.523645043, 0.0, 0.0, 0.0) 128 + +[require] +shader model >= 6.0 + +% SM 6.0 has instructions for inverse trig, which we implement using the native +% equivalents available in SPIR-V. The values below are from the AMD Windows +% drivers, which are very close to those from Ubuntu's calculator app. Results +% from RADV are a bit lower than these, hence the large max ulp difference. +[test] +uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (-0.523598731, 0.0, 0.0, 0.0) 4096 + +uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) 128 + +uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.523598731, 0.0, 0.0, 0.0) 4096 + +[require] +% reset requirements + +[test] +uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.57079637, 0.0, 0.0, 0.0) 128
From: Petrichor Park ppark@codeweavers.com
Extracted by Conor McCarthy from an HLSL patch, with ulp values doubled in some cases to cover SM 6 results. --- tests/hlsl/inverse-trig.shader_test | 113 +++++++++++++++++++++++++++- 1 file changed, 109 insertions(+), 4 deletions(-)
diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test index 56bdb4cbe..3c3043d2a 100644 --- a/tests/hlsl/inverse-trig.shader_test +++ b/tests/hlsl/inverse-trig.shader_test @@ -68,10 +68,10 @@ probe all rgba (0.523645043, 0.0, 0.0, 0.0) 128 [require] shader model >= 6.0
-% SM 6.0 has instructions for inverse trig, which we implement using the native -% equivalents available in SPIR-V. The values below are from the AMD Windows -% drivers, which are very close to those from Ubuntu's calculator app. Results -% from RADV are a bit lower than these, hence the large max ulp difference. +% We implement SM 6.0 inverse trig instructions using the native equivalents +% available in the backend. The values below are from the AMD Windows drivers, +% which are very close to those from Ubuntu's calculator app. Results from +% RADV are a bit lower than these, hence the large max ulp difference. [test] uniform 0 float4 -0.5 0.0 0.0 0.0 todo draw quad @@ -92,3 +92,108 @@ probe all rgba (0.523598731, 0.0, 0.0, 0.0) 4096 uniform 0 float4 1.0 0.0 0.0 0.0 todo draw quad probe all rgba (1.57079637, 0.0, 0.0, 0.0) 128 + + +[pixel shader todo] +uniform float4 a; + +float4 main() : sv_target +{ + return float4(atan(a.x), 0.0, 0.0, 0.0); +} + +[test] +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (-0.785409629, 0.0, 0.0, 0.0) 512 + +uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (-0.4636476, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.4636476, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.785409629, 0.0, 0.0, 0.0) 512 + +[pixel shader todo] +uniform float4 a; + +float4 main() : sv_target +{ + // The argument order is (y,x), and test case inputs are (y,x) also. + return float4(atan2(a.x, a.y), 0.0, 0.0, 0.0); +} + +[test] +% Non-degenerate cases +uniform 0 float4 1.0 1.0 0.0 0.0 +todo draw quad +probe all rgba (0.785385, 0.0, 0.0, 0.0) 512 + +uniform 0 float4 5.0 -5.0 0.0 0.0 +todo draw quad +probe all rgba (2.356194, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -3.0 -3.0 0.0 0.0 +todo draw quad +probe all rgba (-2.356194, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.570796, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (-1.570796, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 1.0 0.0 0.0 +todo draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 -1.0 0.0 0.0 +todo draw quad +probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256 + +% Degenerate cases +uniform 0 float4 0.00001 0.00002 0.0 0.0 +todo draw quad +probe all rgba (0.463647, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.00001 -0.00002 0.0 0.0 +todo draw quad +probe all rgba (2.677945, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -0.00001 100000.0 0.0 0.0 +todo draw quad +probe all rgba (-0.000000000099986595, 0.0, 0.0, 0.0) 2048 + +uniform 0 float4 10000000.0 0.00000001 0.0 0.0 +todo draw quad +probe all rgba (1.570796, 0.0, 0.0, 0.0) 256 + +% Negative zero behavior should be to treat it the +% same as normal zero. +uniform 0 float4 1000000000.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.570796, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 1000000000.0 -0.0 0.0 0.0 +todo draw quad +probe all rgba (1.570796, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 -1.0 0.0 0.0 +todo draw quad +probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -0.0 -1.0 0.0 0.0 +todo draw quad +probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256 +
From: Conor McCarthy cmccarthy@codeweavers.com
--- libs/vkd3d-shader/d3d_asm.c | 3 +++ libs/vkd3d-shader/dxil.c | 12 ++++++++++++ libs/vkd3d-shader/vkd3d_shader_private.h | 3 +++ 3 files changed, 18 insertions(+)
diff --git a/libs/vkd3d-shader/d3d_asm.c b/libs/vkd3d-shader/d3d_asm.c index dd96b7fa5..3ae736a80 100644 --- a/libs/vkd3d-shader/d3d_asm.c +++ b/libs/vkd3d-shader/d3d_asm.c @@ -30,8 +30,11 @@ static const char * const shader_opcode_names[] = { [VKD3DSIH_ABS ] = "abs", + [VKD3DSIH_ACOS ] = "acos", [VKD3DSIH_ADD ] = "add", [VKD3DSIH_AND ] = "and", + [VKD3DSIH_ASIN ] = "asin", + [VKD3DSIH_ATAN ] = "atan", [VKD3DSIH_ATOMIC_AND ] = "atomic_and", [VKD3DSIH_ATOMIC_CMP_STORE ] = "atomic_cmp_store", [VKD3DSIH_ATOMIC_IADD ] = "atomic_iadd", diff --git a/libs/vkd3d-shader/dxil.c b/libs/vkd3d-shader/dxil.c index 3e1ba3911..1a6fe04ee 100644 --- a/libs/vkd3d-shader/dxil.c +++ b/libs/vkd3d-shader/dxil.c @@ -338,6 +338,9 @@ enum dx_intrinsic_opcode DX_COS = 12, DX_SIN = 13, DX_TAN = 14, + DX_ACOS = 15, + DX_ASIN = 16, + DX_ATAN = 17, DX_EXP = 21, DX_FRC = 22, DX_LOG = 23, @@ -3528,6 +3531,12 @@ static enum vkd3d_shader_opcode map_dx_unary_op(enum dx_intrinsic_opcode op) return VKD3DSIH_ISFINITE; case DX_TAN: return VKD3DSIH_TAN; + case DX_ACOS: + return VKD3DSIH_ACOS; + case DX_ASIN: + return VKD3DSIH_ASIN; + case DX_ATAN: + return VKD3DSIH_ATAN; case DX_EXP: return VKD3DSIH_EXP; case DX_FRC: @@ -3957,6 +3966,9 @@ struct sm6_dx_opcode_info */ static const struct sm6_dx_opcode_info sm6_dx_op_table[] = { + [DX_ACOS ] = {"g", "R", sm6_parser_emit_dx_unary}, + [DX_ASIN ] = {"g", "R", sm6_parser_emit_dx_unary}, + [DX_ATAN ] = {"g", "R", sm6_parser_emit_dx_unary}, [DX_BFREV ] = {"m", "R", sm6_parser_emit_dx_unary}, [DX_BUFFER_LOAD ] = {"o", "Hii", sm6_parser_emit_dx_buffer_load}, [DX_CBUFFER_LOAD_LEGACY ] = {"o", "Hi", sm6_parser_emit_dx_cbuffer_load}, diff --git a/libs/vkd3d-shader/vkd3d_shader_private.h b/libs/vkd3d-shader/vkd3d_shader_private.h index e5f706e95..36519cdc3 100644 --- a/libs/vkd3d-shader/vkd3d_shader_private.h +++ b/libs/vkd3d-shader/vkd3d_shader_private.h @@ -225,8 +225,11 @@ enum vkd3d_shader_error enum vkd3d_shader_opcode { VKD3DSIH_ABS, + VKD3DSIH_ACOS, VKD3DSIH_ADD, VKD3DSIH_AND, + VKD3DSIH_ASIN, + VKD3DSIH_ATAN, VKD3DSIH_ATOMIC_AND, VKD3DSIH_ATOMIC_CMP_STORE, VKD3DSIH_ATOMIC_IADD,
From: Conor McCarthy cmccarthy@codeweavers.com
--- libs/vkd3d-shader/spirv.c | 6 +++ tests/hlsl/inverse-trig.shader_test | 60 ++++++++++++++--------------- 2 files changed, 36 insertions(+), 30 deletions(-)
diff --git a/libs/vkd3d-shader/spirv.c b/libs/vkd3d-shader/spirv.c index 298ad31d9..9f2649f28 100644 --- a/libs/vkd3d-shader/spirv.c +++ b/libs/vkd3d-shader/spirv.c @@ -6969,6 +6969,9 @@ static enum GLSLstd450 spirv_compiler_map_ext_glsl_instruction( } glsl_insts[] = { + {VKD3DSIH_ACOS, GLSLstd450Acos}, + {VKD3DSIH_ASIN, GLSLstd450Asin}, + {VKD3DSIH_ATAN, GLSLstd450Atan}, {VKD3DSIH_DFMA, GLSLstd450Fma}, {VKD3DSIH_DMAX, GLSLstd450NMax}, {VKD3DSIH_DMIN, GLSLstd450NMin}, @@ -9553,6 +9556,9 @@ static int spirv_compiler_handle_instruction(struct spirv_compiler *compiler, case VKD3DSIH_ISFINITE: spirv_compiler_emit_isfinite(compiler, instruction); break; + case VKD3DSIH_ACOS: + case VKD3DSIH_ASIN: + case VKD3DSIH_ATAN: case VKD3DSIH_DFMA: case VKD3DSIH_DMAX: case VKD3DSIH_DMIN: diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test index 3c3043d2a..0d3eea937 100644 --- a/tests/hlsl/inverse-trig.shader_test +++ b/tests/hlsl/inverse-trig.shader_test @@ -13,23 +13,23 @@ float4 main() : sv_target
[test] uniform 0 float4 -1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (3.14159274, 0.0, 0.0, 0.0) 128
uniform 0 float4 -0.5 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (2.094441441, 0.0, 0.0, 0.0) 128
uniform 0 float4 0.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.57072878, 0.0, 0.0, 0.0) 1024
uniform 0 float4 0.5 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.04715133, 0.0, 0.0, 0.0) 256
uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0) 128
[pixel shader todo] @@ -42,7 +42,7 @@ float4 main() : sv_target
[test] uniform 0 float4 -1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-1.57079637, 0.0, 0.0, 0.0) 128
[require] @@ -74,15 +74,15 @@ shader model >= 6.0 % RADV are a bit lower than these, hence the large max ulp difference. [test] uniform 0 float4 -0.5 0.0 0.0 0.0 -todo draw quad +draw quad probe all rgba (-0.523598731, 0.0, 0.0, 0.0) 4096
uniform 0 float4 0.0 0.0 0.0 0.0 -todo draw quad +draw quad probe all rgba (0.0, 0.0, 0.0, 0.0) 128
uniform 0 float4 0.5 0.0 0.0 0.0 -todo draw quad +draw quad probe all rgba (0.523598731, 0.0, 0.0, 0.0) 4096
[require] @@ -90,7 +90,7 @@ probe all rgba (0.523598731, 0.0, 0.0, 0.0) 4096
[test] uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.57079637, 0.0, 0.0, 0.0) 128
@@ -104,23 +104,23 @@ float4 main() : sv_target
[test] uniform 0 float4 -1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-0.785409629, 0.0, 0.0, 0.0) 512
uniform 0 float4 -0.5 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-0.4636476, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.5 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.4636476, 0.0, 0.0, 0.0) 256
uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.785409629, 0.0, 0.0, 0.0) 512
[pixel shader todo] @@ -135,65 +135,65 @@ float4 main() : sv_target [test] % Non-degenerate cases uniform 0 float4 1.0 1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.785385, 0.0, 0.0, 0.0) 512
uniform 0 float4 5.0 -5.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (2.356194, 0.0, 0.0, 0.0) 256
uniform 0 float4 -3.0 -3.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-2.356194, 0.0, 0.0, 0.0) 256
uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256
uniform 0 float4 -1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-1.570796, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.0 1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.0 -1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256
% Degenerate cases uniform 0 float4 0.00001 0.00002 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.463647, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.00001 -0.00002 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (2.677945, 0.0, 0.0, 0.0) 256
uniform 0 float4 -0.00001 100000.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-0.000000000099986595, 0.0, 0.0, 0.0) 2048
uniform 0 float4 10000000.0 0.00000001 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256
% Negative zero behavior should be to treat it the % same as normal zero. uniform 0 float4 1000000000.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256
uniform 0 float4 1000000000.0 -0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.0 -1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256
uniform 0 float4 -0.0 -1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256
This merge request was approved by Conor McCarthy.
+% Because sqrt isn't identical across platforms, there is some inaccuracy +% here even with an identical algorithm, and because it's so near zero, +% each ulp is really small. So, in order to pass there needs to be this +% enormous margin. +uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.0000675916672, 0.0, 0.0, 0.0) 131072
Which sqrt is that? Yeah, I know, the shader model 4 lowering for the asin() in the shader for this test involves a sqrt(). But that's a fairly obscure implementation detail, and doesn't seem obvious to someone reading the test.
The maximum difference here doesn't seem that excessive, but for what it's worth, an alternative would be to add a quantisation step to the shader; we use that approach in e.g. ddxddy.shader_test.