Implements asin, acos, atan, and atan2.
Also includes some tests in a new test file.
One possible problem here is that I'm not sure how to test what Microsoft's atan and atan2 outputs are in boundary cases like atan2(1, 0). I've made the test suites adhere with the calculator program I've been using (Qalculate, which I assume is using libc's atan2).
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=55154
-- v5: vkd3d-shader/hlsl: Implement atan2. vkd3d-shader/hlsl: Implement atan. vkd3d-shader/hlsl: Implement acos and asin trig intrinsics.
From: Petrichor Park ppark@codeweavers.com
Also includes some tests in a new test file.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=55154 --- Makefile.am | 1 + libs/vkd3d-shader/hlsl.y | 62 +++++++++++++++++++++++++++++ tests/hlsl/inverse-trig.shader_test | 30 ++++++++++++++ 3 files changed, 93 insertions(+) create mode 100644 tests/hlsl/inverse-trig.shader_test
diff --git a/Makefile.am b/Makefile.am index 5fbac802b..b6d3b9639 100644 --- a/Makefile.am +++ b/Makefile.am @@ -112,6 +112,7 @@ vkd3d_shader_tests = \ tests/hlsl/initializer-struct.shader_test \ tests/hlsl/intrinsic-override.shader_test \ tests/hlsl/invalid.shader_test \ + tests/hlsl/inverse-trig.shader_test \ tests/hlsl/is-front-face.shader_test \ tests/hlsl/ldexp.shader_test \ tests/hlsl/length.shader_test \ diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 71c0ae46b..f21aba6ec 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2605,6 +2605,60 @@ static bool intrinsic_abs(struct hlsl_ctx *ctx, return !!add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_ABS, params->args[0], loc); }
+static bool write_acos_or_asin(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc, bool asin_mode) +{ + struct hlsl_ir_function_decl *func; + struct hlsl_type *type; + char *body; + + static const char template[] = + "%s %s(%s x)\n" + "{\n" + " %s abs_arg = abs(x);\n" + " %s correction = sqrt(1.0 - abs_arg);\n" + " %s result = correction * (((-0.019963376f\n" + " * abs_arg + 0.07612093f)\n" + " * abs_arg - 0.21274031f)\n" + " * abs_arg + 1.5707964f\n);" + /* Piecewise, (x >= 0) ? result : pi - result. */ + " %s mask = x >= 0.0;\n" + " result = mask * result\n" + " + (1.0 - mask) * (3.14159265 - result);\n" + " return %s;" + "}"; + static const char fn_name_acos[] = "acos"; + static const char fn_name_asin[] = "asin"; + static const char return_stmt_acos[] = "result"; + static const char return_stmt_asin[] = "-result + (3.14159265 / 2.0)"; + + const char *fn_name = asin_mode + ? fn_name_asin + : fn_name_acos; + + type = params->args[0]->data_type; + type = hlsl_get_numeric_type(ctx, type->class, HLSL_TYPE_FLOAT, type->dimx, type->dimy); + + if (!(body = hlsl_sprintf_alloc(ctx, template, + type->name, fn_name, type->name, + type->name, type->name, + type->name, type->name, + (asin_mode ? return_stmt_asin : return_stmt_acos)))) + return false; + func = hlsl_compile_internal_function(ctx, fn_name, body); + vkd3d_free(body); + if (!func) + return false; + + return add_user_call(ctx, func, params, loc); +} + +static bool intrinsic_acos(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + return write_acos_or_asin(ctx, params, loc, false); +} + static bool intrinsic_all(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { @@ -2682,6 +2736,12 @@ static bool intrinsic_any(struct hlsl_ctx *ctx, return false; }
+static bool intrinsic_asin(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + return write_acos_or_asin(ctx, params, loc, true); +} + /* Find the type corresponding to the given source type, with the same * dimensions but a different base type. */ static struct hlsl_type *convert_numeric_type(const struct hlsl_ctx *ctx, @@ -3776,9 +3836,11 @@ intrinsic_functions[] = /* Note: these entries should be kept in alphabetical order. */ {"D3DCOLORtoUBYTE4", 1, true, intrinsic_d3dcolor_to_ubyte4}, {"abs", 1, true, intrinsic_abs}, + {"acos", 1, true, intrinsic_acos}, {"all", 1, true, intrinsic_all}, {"any", 1, true, intrinsic_any}, {"asfloat", 1, true, intrinsic_asfloat}, + {"asin", 1, true, intrinsic_asin}, {"asuint", -1, true, intrinsic_asuint}, {"clamp", 3, true, intrinsic_clamp}, {"clip", 1, true, intrinsic_clip}, diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test new file mode 100644 index 000000000..7be7c5e7d --- /dev/null +++ b/tests/hlsl/inverse-trig.shader_test @@ -0,0 +1,30 @@ +[pixel shader] +uniform float4 a; + +float4 main() : sv_target +{ + // Add 10 to avoid epsilon problems near 0 + // Note that this means -8.5odd means -pi/2 + return float4(asin(a.x), acos(a.y), 0.0, 0) + 10.0; +} + +[test] +uniform 0 float4 -1.0 -1.0 -1.0 0.0 +draw quad +probe all rgba (8.429203673, 13.141592654, 10.0, 10.0) 256 + +uniform 0 float4 -0.5 -0.5 -0.5 0.0 +draw quad +probe all rgba (9.476401224, 12.09439510, 10.0, 10.0) 256 + +uniform 0 float4 0.0 0.0 0.0 0.0 +draw quad +probe all rgba (10.0, 11.570796327, 10.0, 10.0) 256 + +uniform 0 float4 0.5 0.5 0.5 0.0 +draw quad +probe all rgba (10.52359878, 11.04719755, 10.0, 10.0) 256 + +uniform 0 float4 1.0 1.0 1.0 0.0 +draw quad +probe all rgba (11.570796327, 10.0, 10.0, 10.0) 256
From: Petrichor Park ppark@codeweavers.com
--- libs/vkd3d-shader/hlsl.y | 48 +++++++++++++++++++++++++++++ tests/hlsl/inverse-trig.shader_test | 10 +++--- 2 files changed, 53 insertions(+), 5 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index f21aba6ec..3511c4a68 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2742,6 +2742,53 @@ static bool intrinsic_asin(struct hlsl_ctx *ctx, return write_acos_or_asin(ctx, params, loc, true); }
+static bool intrinsic_atan(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_function_decl *func; + struct hlsl_type *type; + char *body; + + static const char template[] = + "%s atan(%s x)\n" + "{\n" + " %s abs_arg = abs(x);\n" + " %s input = (abs_arg < 1.0) ? x : 1.0 / x;\n" + /* HLSL pow(x, y) always returns NaN if x<0. So write out the powers longhand. */ + " %s x2 = input * input;\n" + " %s x3 = input * x2;\n" + " %s x5 = x3 * x2;\n" + " %s x7 = x5 * x2;\n" + " %s x9 = x7 * x2;\n" + " %s poly_approx = \n" + " 0.020265052 * x9\n" + " + -0.0840074 * x7\n" + " + 0.17941668 * x5\n" + " + -0.3301321 * x3\n" + " + 0.99985593 * input;\n" + " return (abs_arg < 1.0)\n" + " ? poly_approx\n" + " : (((x < 0.0 ? -1 : 1) * 3.14159265 / 2.0) - poly_approx);\n" + "}"; + + if (!(type = elementwise_intrinsic_get_common_type(ctx, params, loc))) + return false; + type = hlsl_get_numeric_type(ctx, type->class, HLSL_TYPE_FLOAT, type->dimx, type->dimy); + + + if (!(body = hlsl_sprintf_alloc(ctx, template, type->name, type->name, + type->name, type->name, + type->name, type->name, type->name, type->name, type->name, + type->name))) + return false; + func = hlsl_compile_internal_function(ctx, "atan", body); + vkd3d_free(body); + if (!func) + return false; + + return add_user_call(ctx, func, params, loc); +} + /* Find the type corresponding to the given source type, with the same * dimensions but a different base type. */ static struct hlsl_type *convert_numeric_type(const struct hlsl_ctx *ctx, @@ -3842,6 +3889,7 @@ intrinsic_functions[] = {"asfloat", 1, true, intrinsic_asfloat}, {"asin", 1, true, intrinsic_asin}, {"asuint", -1, true, intrinsic_asuint}, + {"atan", 1, true, intrinsic_atan}, {"clamp", 3, true, intrinsic_clamp}, {"clip", 1, true, intrinsic_clip}, {"cos", 1, true, intrinsic_cos}, diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test index 7be7c5e7d..8820ec002 100644 --- a/tests/hlsl/inverse-trig.shader_test +++ b/tests/hlsl/inverse-trig.shader_test @@ -5,17 +5,17 @@ float4 main() : sv_target { // Add 10 to avoid epsilon problems near 0 // Note that this means -8.5odd means -pi/2 - return float4(asin(a.x), acos(a.y), 0.0, 0) + 10.0; + return float4(asin(a.x), acos(a.y), atan(a.z), 0) + 10.0; }
[test] uniform 0 float4 -1.0 -1.0 -1.0 0.0 draw quad -probe all rgba (8.429203673, 13.141592654, 10.0, 10.0) 256 +probe all rgba (8.429203673, 13.141592654, 9.214601837, 10.0) 256
uniform 0 float4 -0.5 -0.5 -0.5 0.0 draw quad -probe all rgba (9.476401224, 12.09439510, 10.0, 10.0) 256 +probe all rgba (9.476401224, 12.09439510, 9.536352391, 10.0) 256
uniform 0 float4 0.0 0.0 0.0 0.0 draw quad @@ -23,8 +23,8 @@ probe all rgba (10.0, 11.570796327, 10.0, 10.0) 256
uniform 0 float4 0.5 0.5 0.5 0.0 draw quad -probe all rgba (10.52359878, 11.04719755, 10.0, 10.0) 256 +probe all rgba (10.52359878, 11.04719755, 10.46364761, 10.0) 256
uniform 0 float4 1.0 1.0 1.0 0.0 draw quad -probe all rgba (11.570796327, 10.0, 10.0, 10.0) 256 +probe all rgba (11.570796327, 10.0, 10.7853981634, 10.0) 256
From: Petrichor Park ppark@codeweavers.com
--- libs/vkd3d-shader/hlsl.y | 37 +++++++++++++++++++++++++ tests/hlsl/inverse-trig.shader_test | 43 +++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+)
diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 3511c4a68..dceec5ef5 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2789,6 +2789,42 @@ static bool intrinsic_atan(struct hlsl_ctx *ctx, return add_user_call(ctx, func, params, loc); }
+static bool intrinsic_atan2(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_function_decl *func; + struct hlsl_type *type; + char *body; + + static const char template[] = + "%s atan2(%s y, %s x)\n" + "{\n" + " %s pi = %s(3.1415926);\n" + " %s offset = x >= 0.0\n" + " ? 0.0\n" + " : y >= 0.0\n" + " ? pi\n" + " : -pi;\n" + " return atan(y/x) + offset;\n" + "}"; + + if (!(type = elementwise_intrinsic_get_common_type(ctx, params, loc))) + return false; + type = hlsl_get_numeric_type(ctx, type->class, HLSL_TYPE_FLOAT, type->dimx, type->dimy); + + if (!(body = hlsl_sprintf_alloc(ctx, template, + type->name, type->name, type->name, + type->name, type->name, type->name))) + return false; + func = hlsl_compile_internal_function(ctx, "atan2", body); + vkd3d_free(body); + if (!func) + return false; + + return add_user_call(ctx, func, params, loc); +} + + /* Find the type corresponding to the given source type, with the same * dimensions but a different base type. */ static struct hlsl_type *convert_numeric_type(const struct hlsl_ctx *ctx, @@ -3890,6 +3926,7 @@ intrinsic_functions[] = {"asin", 1, true, intrinsic_asin}, {"asuint", -1, true, intrinsic_asuint}, {"atan", 1, true, intrinsic_atan}, + {"atan2", 2, true, intrinsic_atan2}, {"clamp", 3, true, intrinsic_clamp}, {"clip", 1, true, intrinsic_clip}, {"cos", 1, true, intrinsic_cos}, diff --git a/tests/hlsl/inverse-trig.shader_test b/tests/hlsl/inverse-trig.shader_test index 8820ec002..493019984 100644 --- a/tests/hlsl/inverse-trig.shader_test +++ b/tests/hlsl/inverse-trig.shader_test @@ -28,3 +28,46 @@ probe all rgba (10.52359878, 11.04719755, 10.46364761, 10.0) 256 uniform 0 float4 1.0 1.0 1.0 0.0 draw quad probe all rgba (11.570796327, 10.0, 10.7853981634, 10.0) 256 + + +[pixel shader] +uniform float4 a; + +float4 main() : sv_target +{ + // Return things in terms of pi to make the numbers in the + // test cases more understandable. + // Also, because the argument order is (y,x) the numbers are + // passed in "backwards" here, so they're the right way in the + // test cases. + return float4(atan2(a.x, a.y), 0.0, 0.0, 0.0) / 3.14159265; +} + +[test] +uniform 0 float4 1.0 1.0 0.0 0.0 +draw quad +probe all rgba (0.25, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 5.0 -5.0 0.0 0.0 +draw quad +probe all rgba (0.75, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -3.0 -3.0 0.0 0.0 +draw quad +probe all rgba (-0.75, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 1.0 0.0 0.0 0.0 +draw quad +probe all rgba (0.5, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 -1.0 0.0 0.0 0.0 +draw quad +probe all rgba (-0.5, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 1.0 0.0 0.0 +draw quad +probe all rgba (0.0, 0.0, 0.0, 0.0) 256 + +uniform 0 float4 0.0 -1.0 0.0 0.0 +draw quad +probe all rgba (1.0, 0.0, 0.0, 0.0) 256
Further decompilation work has shown that the native compiler shouldn't actually have asin(0)=0, because it uses two different values for pi/2. But constant folding will still replace a literal asin(0) with 0.
I'm not really sure what to do here, given that Microsoft's code is so obviously wrong. I also don't know where to get a real machine to test it on ...
My instinct is to just implement *something* reasonably correct at this point, given how long I've been hung up on edge cases. I think Wine bug 55154 will be a lot better served by having *some* version of asin that works, even if it's not bit identical to Windows.
Any thoughts?
On Tue Nov 21 13:32:34 2023 +0000, Petrichor Park wrote:
Further decompilation work has shown that the native compiler shouldn't actually have asin(0)=0, because it uses two different values for pi/2. But constant folding will still replace a literal asin(0) with 0. I'm not really sure what to do here, given that Microsoft's code is so obviously wrong. I also don't know where to get a real machine to test it on ... My instinct is to just implement *something* reasonably correct at this point, given how long I've been hung up on edge cases. I think Wine bug 55154 will be a lot better served by having *some* version of asin that works, even if it's not bit identical to Windows. Any thoughts?
Do you know any application which specifically requires `asin(0) == 0` precisely on constant folding? If not, I'd just ignore it for the time being. Implement whatever formula native is emitting, that's what shaders expect.
On Tue Nov 21 13:32:33 2023 +0000, Giovanni Mascellani wrote:
Do you know any application which specifically requires `asin(0) == 0` precisely on constant folding? If not, I'd just ignore it for the time being. Implement whatever formula native is emitting, that's what shaders expect.
OK
On Wed Nov 22 00:16:23 2023 +0000, Petrichor Park wrote:
OK
If it consistently gives 0 for asin(0), please add a todo test for it, so it's not forgotten.