With tests from !364, separated out from the HLSL changes there and updated.
It is apparently unnecessary to match the SM4/5 implementation, since the AMD Windows results differ. The RADV results are a bit wrong, but Proton uses the SPIR-V GLSL extension instructions too, and no workarounds have been implemented there.
-- v3: vkd3d-shader/spirv: Handle the ACOS, ASIN and ATAN instructions in spirv_compiler_emit_ext_glsl_instruction(). vkd3d-shader/dxil: Handle inverse trigonometric functions in sm6_parser_emit_dx_unary(). tests/shader-runner: Add tests for atan and atan2 trig intrinsics. tests/shader-runner: Add tests for acos and asin trig intrinsics.
From: Petrichor Park ppark@codeweavers.com
Extracted by Conor McCarthy from an HLSL patch, and modified to include SM 6 variations. --- Makefile.am | 1 + tests/hlsl/inverse-trig.shader_test | 92 +++++++++++++++++++++++++++++ 2 files changed, 93 insertions(+) create mode 100644 tests/hlsl/inverse-trig.shader_test
diff --git a/Makefile.am b/Makefile.am index 90e7dcfcc..1687d9a05 100644 --- a/Makefile.am +++ b/Makefile.am @@ -121,6 +121,7 @@ vkd3d_shader_tests = \ tests/hlsl/initializer-struct.shader_test \ tests/hlsl/intrinsic-override.shader_test \ tests/hlsl/invalid.shader_test \ + tests/hlsl/inverse-trig.shader_test \ tests/hlsl/is-front-face.shader_test \ tests/hlsl/ldexp.shader_test \ tests/hlsl/length.shader_test \ diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test new file mode 100644 index 000000000..5e71351e4 --- /dev/null +++ b/tests/hlsl/inverse-trig.shader_test @@ -0,0 +1,92 @@ +% TPF does not define instructions for inverse trig; these intrinsics are +% decomposed into other instructions. FXC emits code which may vary wrt other +% implementations. DXIL defines intrinsics for inverse trig, to be implemented +% by the backend. + +[pixel shader todo] +uniform float4 a; + +float4 main() : sv_target +{ + return float4(acos(a.x), 0.0, 0.0, 0.0); +} + +[test] +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (3.14159274, 0.0, 0.0, 0.0) 128 + +uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (2.094441441, 0.0, 0.0, 0.0) 128 + +uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.57072878, 0.0, 0.0, 0.0) 1024 + +uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.04715133, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) 128 + +[pixel shader todo] +uniform float4 a; + +float4 main() : sv_target +{ + float4 result = float4(asin(a.x), 0.0, 0.0, 0.0); + // Quantize to cover implementation variations. + return round(result * 20000.0); +} + +[test] +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (-31416.0, 0.0, 0.0, 0.0) + +[require] +shader model < 6.0 + +[test] +uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (-10473.0, 0.0, 0.0, 0.0) + +uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.0, 0.0, 0.0, 0.0) + +uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (10473.0, 0.0, 0.0, 0.0) + +[require] +shader model >= 6.0 + +% We implement SM 6.0 inverse trig instructions using the native equivalents +% available in the backend. The values below are from the AMD Windows drivers, +% which are very close to those from Ubuntu's calculator app. Results from +% RADV are a bit lower than these, hence the large max ulp difference. +[test] +uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (-10472.0, 0.0, 0.0, 0.0) 4096 + +uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) + +uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (10472.0, 0.0, 0.0, 0.0) 4096 + +[require] +% reset requirements + +[test] +uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (31416.0, 0.0, 0.0, 0.0)
From: Petrichor Park ppark@codeweavers.com
Extracted by Conor McCarthy from an HLSL patch, with ulp values doubled in some cases to cover SM 6 results. --- tests/hlsl/inverse-trig.shader_test | 105 ++++++++++++++++++++++++++++ 1 file changed, 105 insertions(+)
diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test index 5e71351e4..05941b14d 100644 --- a/tests/hlsl/inverse-trig.shader_test +++ b/tests/hlsl/inverse-trig.shader_test @@ -90,3 +90,108 @@ probe all rgba (10472.0, 0.0, 0.0, 0.0) 4096 uniform 0 float4 1.0 0.0 0.0 0.0 todo draw quad probe all rgba (31416.0, 0.0, 0.0, 0.0) + + +[pixel shader todo] +uniform float4 a; + +float4 main() : sv_target +{ + return float4(atan(a.x), 0.0, 0.0, 0.0); +} + +[test] +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (-0.785409629, 0.0, 0.0, 0.0) 512 + +uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (-0.4636476, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.4636476, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.785409629, 0.0, 0.0, 0.0) 512 + +[pixel shader todo] +uniform float4 a; + +float4 main() : sv_target +{ + // The argument order is (y,x), and test case inputs are (y,x) also. + return float4(atan2(a.x, a.y), 0.0, 0.0, 0.0); +} + +[test] +% Non-degenerate cases +uniform 0 float4 1.0 1.0 0.0 0.0 +todo draw quad +probe all rgba (0.785385, 0.0, 0.0, 0.0) 512 + +uniform 0 float4 5.0 -5.0 0.0 0.0 +todo draw quad +probe all rgba (2.356194, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -3.0 -3.0 0.0 0.0 +todo draw quad +probe all rgba (-2.356194, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.570796, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (-1.570796, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 1.0 0.0 0.0 +todo draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 -1.0 0.0 0.0 +todo draw quad +probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256 + +% Degenerate cases +uniform 0 float4 0.00001 0.00002 0.0 0.0 +todo draw quad +probe all rgba (0.463647, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.00001 -0.00002 0.0 0.0 +todo draw quad +probe all rgba (2.677945, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -0.00001 100000.0 0.0 0.0 +todo draw quad +probe all rgba (-0.000000000099986595, 0.0, 0.0, 0.0) 2048 + +uniform 0 float4 10000000.0 0.00000001 0.0 0.0 +todo draw quad +probe all rgba (1.570796, 0.0, 0.0, 0.0) 256 + +% Negative zero behavior should be to treat it the +% same as normal zero. +uniform 0 float4 1000000000.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.570796, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 1000000000.0 -0.0 0.0 0.0 +todo draw quad +probe all rgba (1.570796, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 -1.0 0.0 0.0 +todo draw quad +probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -0.0 -1.0 0.0 0.0 +todo draw quad +probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256 +
From: Conor McCarthy cmccarthy@codeweavers.com
--- libs/vkd3d-shader/d3d_asm.c | 3 +++ libs/vkd3d-shader/dxil.c | 12 ++++++++++++ libs/vkd3d-shader/vkd3d_shader_private.h | 3 +++ 3 files changed, 18 insertions(+)
diff --git a/libs/vkd3d-shader/d3d_asm.c b/libs/vkd3d-shader/d3d_asm.c index dd96b7fa5..3ae736a80 100644 --- a/libs/vkd3d-shader/d3d_asm.c +++ b/libs/vkd3d-shader/d3d_asm.c @@ -30,8 +30,11 @@ static const char * const shader_opcode_names[] = { [VKD3DSIH_ABS ] = "abs", + [VKD3DSIH_ACOS ] = "acos", [VKD3DSIH_ADD ] = "add", [VKD3DSIH_AND ] = "and", + [VKD3DSIH_ASIN ] = "asin", + [VKD3DSIH_ATAN ] = "atan", [VKD3DSIH_ATOMIC_AND ] = "atomic_and", [VKD3DSIH_ATOMIC_CMP_STORE ] = "atomic_cmp_store", [VKD3DSIH_ATOMIC_IADD ] = "atomic_iadd", diff --git a/libs/vkd3d-shader/dxil.c b/libs/vkd3d-shader/dxil.c index 3e1ba3911..1a6fe04ee 100644 --- a/libs/vkd3d-shader/dxil.c +++ b/libs/vkd3d-shader/dxil.c @@ -338,6 +338,9 @@ enum dx_intrinsic_opcode DX_COS = 12, DX_SIN = 13, DX_TAN = 14, + DX_ACOS = 15, + DX_ASIN = 16, + DX_ATAN = 17, DX_EXP = 21, DX_FRC = 22, DX_LOG = 23, @@ -3528,6 +3531,12 @@ static enum vkd3d_shader_opcode map_dx_unary_op(enum dx_intrinsic_opcode op) return VKD3DSIH_ISFINITE; case DX_TAN: return VKD3DSIH_TAN; + case DX_ACOS: + return VKD3DSIH_ACOS; + case DX_ASIN: + return VKD3DSIH_ASIN; + case DX_ATAN: + return VKD3DSIH_ATAN; case DX_EXP: return VKD3DSIH_EXP; case DX_FRC: @@ -3957,6 +3966,9 @@ struct sm6_dx_opcode_info */ static const struct sm6_dx_opcode_info sm6_dx_op_table[] = { + [DX_ACOS ] = {"g", "R", sm6_parser_emit_dx_unary}, + [DX_ASIN ] = {"g", "R", sm6_parser_emit_dx_unary}, + [DX_ATAN ] = {"g", "R", sm6_parser_emit_dx_unary}, [DX_BFREV ] = {"m", "R", sm6_parser_emit_dx_unary}, [DX_BUFFER_LOAD ] = {"o", "Hii", sm6_parser_emit_dx_buffer_load}, [DX_CBUFFER_LOAD_LEGACY ] = {"o", "Hi", sm6_parser_emit_dx_cbuffer_load}, diff --git a/libs/vkd3d-shader/vkd3d_shader_private.h b/libs/vkd3d-shader/vkd3d_shader_private.h index e5f706e95..36519cdc3 100644 --- a/libs/vkd3d-shader/vkd3d_shader_private.h +++ b/libs/vkd3d-shader/vkd3d_shader_private.h @@ -225,8 +225,11 @@ enum vkd3d_shader_error enum vkd3d_shader_opcode { VKD3DSIH_ABS, + VKD3DSIH_ACOS, VKD3DSIH_ADD, VKD3DSIH_AND, + VKD3DSIH_ASIN, + VKD3DSIH_ATAN, VKD3DSIH_ATOMIC_AND, VKD3DSIH_ATOMIC_CMP_STORE, VKD3DSIH_ATOMIC_IADD,
From: Conor McCarthy cmccarthy@codeweavers.com
--- libs/vkd3d-shader/spirv.c | 6 +++ tests/hlsl/inverse-trig.shader_test | 60 ++++++++++++++--------------- 2 files changed, 36 insertions(+), 30 deletions(-)
diff --git a/libs/vkd3d-shader/spirv.c b/libs/vkd3d-shader/spirv.c index 298ad31d9..9f2649f28 100644 --- a/libs/vkd3d-shader/spirv.c +++ b/libs/vkd3d-shader/spirv.c @@ -6969,6 +6969,9 @@ static enum GLSLstd450 spirv_compiler_map_ext_glsl_instruction( } glsl_insts[] = { + {VKD3DSIH_ACOS, GLSLstd450Acos}, + {VKD3DSIH_ASIN, GLSLstd450Asin}, + {VKD3DSIH_ATAN, GLSLstd450Atan}, {VKD3DSIH_DFMA, GLSLstd450Fma}, {VKD3DSIH_DMAX, GLSLstd450NMax}, {VKD3DSIH_DMIN, GLSLstd450NMin}, @@ -9553,6 +9556,9 @@ static int spirv_compiler_handle_instruction(struct spirv_compiler *compiler, case VKD3DSIH_ISFINITE: spirv_compiler_emit_isfinite(compiler, instruction); break; + case VKD3DSIH_ACOS: + case VKD3DSIH_ASIN: + case VKD3DSIH_ATAN: case VKD3DSIH_DFMA: case VKD3DSIH_DMAX: case VKD3DSIH_DMIN: diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test index 05941b14d..7175ab19a 100644 --- a/tests/hlsl/inverse-trig.shader_test +++ b/tests/hlsl/inverse-trig.shader_test @@ -13,23 +13,23 @@ float4 main() : sv_target
[test] uniform 0 float4 -1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (3.14159274, 0.0, 0.0, 0.0) 128
uniform 0 float4 -0.5 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (2.094441441, 0.0, 0.0, 0.0) 128
uniform 0 float4 0.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.57072878, 0.0, 0.0, 0.0) 1024
uniform 0 float4 0.5 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.04715133, 0.0, 0.0, 0.0) 256
uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0) 128
[pixel shader todo] @@ -44,7 +44,7 @@ float4 main() : sv_target
[test] uniform 0 float4 -1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-31416.0, 0.0, 0.0, 0.0)
[require] @@ -72,15 +72,15 @@ shader model >= 6.0 % RADV are a bit lower than these, hence the large max ulp difference. [test] uniform 0 float4 -0.5 0.0 0.0 0.0 -todo draw quad +draw quad probe all rgba (-10472.0, 0.0, 0.0, 0.0) 4096
uniform 0 float4 0.0 0.0 0.0 0.0 -todo draw quad +draw quad probe all rgba (0.0, 0.0, 0.0, 0.0)
uniform 0 float4 0.5 0.0 0.0 0.0 -todo draw quad +draw quad probe all rgba (10472.0, 0.0, 0.0, 0.0) 4096
[require] @@ -88,7 +88,7 @@ probe all rgba (10472.0, 0.0, 0.0, 0.0) 4096
[test] uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (31416.0, 0.0, 0.0, 0.0)
@@ -102,23 +102,23 @@ float4 main() : sv_target
[test] uniform 0 float4 -1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-0.785409629, 0.0, 0.0, 0.0) 512
uniform 0 float4 -0.5 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-0.4636476, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.5 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.4636476, 0.0, 0.0, 0.0) 256
uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.785409629, 0.0, 0.0, 0.0) 512
[pixel shader todo] @@ -133,65 +133,65 @@ float4 main() : sv_target [test] % Non-degenerate cases uniform 0 float4 1.0 1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.785385, 0.0, 0.0, 0.0) 512
uniform 0 float4 5.0 -5.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (2.356194, 0.0, 0.0, 0.0) 256
uniform 0 float4 -3.0 -3.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-2.356194, 0.0, 0.0, 0.0) 256
uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256
uniform 0 float4 -1.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-1.570796, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.0 1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.0, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.0 -1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256
% Degenerate cases uniform 0 float4 0.00001 0.00002 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (0.463647, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.00001 -0.00002 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (2.677945, 0.0, 0.0, 0.0) 256
uniform 0 float4 -0.00001 100000.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (-0.000000000099986595, 0.0, 0.0, 0.0) 2048
uniform 0 float4 10000000.0 0.00000001 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256
% Negative zero behavior should be to treat it the % same as normal zero. uniform 0 float4 1000000000.0 0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256
uniform 0 float4 1000000000.0 -0.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (1.570796, 0.0, 0.0, 0.0) 256
uniform 0 float4 0.0 -1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256
uniform 0 float4 -0.0 -1.0 0.0 0.0 -todo draw quad +todo(sm<6) draw quad probe all rgba (3.1415927, 0.0, 0.0, 0.0) 256
Giovanni Mascellani (@giomasce) commented about tests/hlsl/inverse-trig.shader_test:
+probe all rgba (0.0, 0.0, 0.0, 0.0) 256
+uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.4636476, 0.0, 0.0, 0.0) 256
+uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.785409629, 0.0, 0.0, 0.0) 512
+[pixel shader todo] +uniform float4 a;
+float4 main() : sv_target +{
- // The argument order is (y,x), and test case inputs are (y,x) also.
I don't find this comment particularly helpful, but it's not hurting either.
Giovanni Mascellani (@giomasce) commented about tests/hlsl/inverse-trig.shader_test:
+[pixel shader todo] +uniform float4 a;
+float4 main() : sv_target +{
- return float4(acos(a.x), 0.0, 0.0, 0.0);
+}
+[test] +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (3.14159274, 0.0, 0.0, 0.0) 128
+uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (2.094441441, 0.0, 0.0, 0.0) 128
Could you raise this to 256, so it works on my NVIDIA laptop too?
Giovanni Mascellani (@giomasce) commented about tests/hlsl/inverse-trig.shader_test:
+[test] +uniform 0 float4 -1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (3.14159274, 0.0, 0.0, 0.0) 128
+uniform 0 float4 -0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (2.094441441, 0.0, 0.0, 0.0) 128
+uniform 0 float4 0.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.57072878, 0.0, 0.0, 0.0) 1024
+uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.04715133, 0.0, 0.0, 0.0) 256
And similarly this to 512?
Giovanni Mascellani (@giomasce) commented about tests/hlsl/inverse-trig.shader_test:
+uniform 0 float4 0.5 0.0 0.0 0.0 +todo draw quad +probe all rgba (1.04715133, 0.0, 0.0, 0.0) 256
+uniform 0 float4 1.0 0.0 0.0 0.0 +todo draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) 128
+[pixel shader todo] +uniform float4 a;
+float4 main() : sv_target +{
- float4 result = float4(asin(a.x), 0.0, 0.0, 0.0);
- // Quantize to cover implementation variations.
- return round(result * 20000.0);
Why do you use two different approaches for sine (just setting the ULP tolerance) and cosine (quantizing and handling differently SM1-5 and SM6)? Not that I have a big problem with it, mostly curious.
This merge request was approved by Giovanni Mascellani.
Patch 2/4 introduces a "new blank line at EOF." whitespace error. I think/hope Alexandre's scripts will take care of that though.
This merge request was approved by Henri Verbeet.