This patch series includes an implementation of the long-pending `transpose` intrinsic and the `smoothstep` intrinsic.
While implementing `smoothstep` I realized that some intrinsics have different rules for the allowed data types than expressions:
- Vectors and matrices at the same time are not allowed, regardless of their dimensions. Even if they have the same number of components.
- Any combination of matrices is always allowed, even those when no matrix fits inside another, e.g.:
`float2x3` is compatible with `float3x2`, resulting in `float2x2`.
The common data type is the min on each dimension.
This is the case for `max`, `pow`, `ldexp`, `clamp` and `smoothstep`; which suggest that it is the case for all intrinsics where the operation is applied element-wise. So this was corrected.
A minor fix in `pow`'s type conversion is also included.
-- v2: vkd3d-shader/hlsl: Use add_unary_arithmetic_expr() in intrinsic_pow(). vkd3d-shader/hlsl: Convert elementwise intrinsics args to the proper common type. tests: Test for common type conversion for element-wise intrinsics. vkd3d-shader/hlsl: Support smoothstep() intrinsic.
From: Francisco Casas fcasas@codeweavers.com
--- Makefile.am | 1 + libs/vkd3d-shader/hlsl.y | 59 +++++++++++++++++++++++++ tests/hlsl-transpose.shader_test | 75 ++++++++++++++++++++++++++++++++ 3 files changed, 135 insertions(+) create mode 100644 tests/hlsl-transpose.shader_test
diff --git a/Makefile.am b/Makefile.am index 85cd4642..d1f6ec6b 100644 --- a/Makefile.am +++ b/Makefile.am @@ -117,6 +117,7 @@ vkd3d_shader_tests = \ tests/hlsl-struct-array.shader_test \ tests/hlsl-struct-assignment.shader_test \ tests/hlsl-struct-semantics.shader_test \ + tests/hlsl-transpose.shader_test \ tests/hlsl-vector-indexing.shader_test \ tests/hlsl-vector-indexing-uniform.shader_test \ tests/logic-operations.shader_test \ diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index eedc85bd..b834b230 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2596,6 +2596,64 @@ static bool intrinsic_saturate(struct hlsl_ctx *ctx, return !!add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_SAT, arg, loc); }
+static bool intrinsic_transpose(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_node *arg = params->args[0]; + struct hlsl_type *arg_type = arg->data_type; + struct hlsl_deref var_deref; + struct hlsl_type *mat_type; + struct hlsl_ir_load *load; + struct hlsl_ir_var *var; + unsigned int i, j; + + if (arg_type->type != HLSL_CLASS_SCALAR && arg_type->type != HLSL_CLASS_MATRIX) + { + struct vkd3d_string_buffer *string; + + if ((string = hlsl_type_to_string(ctx, arg_type))) + hlsl_error(ctx, &arg->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_TYPE, + "Wrong type for argument 1 of transpose(): expected a matrix or scalar type, but got '%s'.\n", + string->buffer); + hlsl_release_string_buffer(ctx, string); + return false; + } + + if (arg_type->type == HLSL_CLASS_SCALAR) + { + list_add_tail(params->instrs, &arg->entry); + return true; + } + + mat_type = hlsl_get_matrix_type(ctx, arg_type->base_type, arg_type->dimy, arg_type->dimx); + + if (!(var = hlsl_new_synthetic_var(ctx, "transpose", mat_type, loc))) + return NULL; + hlsl_init_simple_deref_from_var(&var_deref, var); + + for (i = 0; i < arg_type->dimx; ++i) + { + for (j = 0; j < arg_type->dimy; ++j) + { + struct hlsl_ir_store *store; + struct hlsl_block block; + + if (!(load = add_load_component(ctx, params->instrs, arg, j * arg->data_type->dimx + i, loc))) + return false; + + if (!(store = hlsl_new_store_component(ctx, &block, &var_deref, i * var->data_type->dimx + j, &load->node))) + return false; + list_move_tail(params->instrs, &block.instrs); + } + } + + if (!(load = hlsl_new_var_load(ctx, var, *loc))) + return false; + list_add_tail(params->instrs, &load->node.entry); + + return true; +} + static const struct intrinsic_function { const char *name; @@ -2623,6 +2681,7 @@ intrinsic_functions[] = {"pow", 2, true, intrinsic_pow}, {"round", 1, true, intrinsic_round}, {"saturate", 1, true, intrinsic_saturate}, + {"transpose", 1, true, intrinsic_transpose}, };
static int intrinsic_function_name_compare(const void *a, const void *b) diff --git a/tests/hlsl-transpose.shader_test b/tests/hlsl-transpose.shader_test new file mode 100644 index 00000000..83852fa1 --- /dev/null +++ b/tests/hlsl-transpose.shader_test @@ -0,0 +1,75 @@ +[pixel shader] +float4 main() : sv_target +{ + return transpose(5); +} + +[test] +draw quad +probe all rgba (5.0, 5.0, 5.0, 5.0) + + +[pixel shader] +float4 main() : sv_target +{ + float1x1 x = 5; + + return transpose(x); +} + +[test] +draw quad +probe all rgba (5.0, 5.0, 5.0, 5.0) + + +[pixel shader fail] +float4 main() : sv_target +{ + float4 x = float4(1, 2, 3, 4); + + return transpose(x); +} + +[pixel shader] +float4 main() : sv_target +{ + float1x4 x = float1x4(1.0, 2.0, 3.0, 4.0); + + return transpose(x); +} + +[test] +draw quad +probe all rgba (1.0, 2.0, 3.0, 4.0) + + +[pixel shader] +float4 main() : sv_target +{ + float4x3 m = float4x3(1.0, 2.0, 3.0, + 4.0, 5.0, 6.0, + 7.0, 8.0, 9.0, + 10.0, 11.0, 12.0); + + return transpose(m)[1]; +} + +[test] +draw quad +probe all rgba (2.0, 5.0, 8.0, 11.0) + + +[pixel shader] +float4 main() : sv_target +{ + row_major float4x3 m = float4x3(1.0, 2.0, 3.0, + 4.0, 5.0, 6.0, + 7.0, 8.0, 9.0, + 10.0, 11.0, 12.0); + + return transpose(m)[1]; +} + +[test] +draw quad +probe all rgba (2.0, 5.0, 8.0, 11.0)
From: Francisco Casas fcasas@codeweavers.com
--- Makefile.am | 1 + libs/vkd3d-shader/hlsl.y | 77 ++++++++++++++ tests/hlsl-smoothstep.shader_test | 169 ++++++++++++++++++++++++++++++ 3 files changed, 247 insertions(+) create mode 100644 tests/hlsl-smoothstep.shader_test
diff --git a/Makefile.am b/Makefile.am index d1f6ec6b..57cb76ed 100644 --- a/Makefile.am +++ b/Makefile.am @@ -111,6 +111,7 @@ vkd3d_shader_tests = \ tests/hlsl-return-void.shader_test \ tests/hlsl-shape.shader_test \ tests/hlsl-single-numeric-initializer.shader_test \ + tests/hlsl-smoothstep.shader_test \ tests/hlsl-state-block-syntax.shader_test \ tests/hlsl-static-initializer.shader_test \ tests/hlsl-storage-qualifiers.shader_test \ diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index b834b230..7ac1dd7e 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2596,6 +2596,82 @@ static bool intrinsic_saturate(struct hlsl_ctx *ctx, return !!add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_SAT, arg, loc); }
+/* smoothstep(a, b, x) = p^2 (3 - 2p), where p = saturate((x - a)/(b - a)) */ +static bool intrinsic_smoothstep(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_node *min_arg, *max_arg, *x_arg, *p, *p_num, *p_denom, *res; + struct hlsl_ir_constant *one, *minus_two, *three; + enum hlsl_type_class common_class; + struct hlsl_type *common_type; + unsigned int dimx, dimy; + + min_arg = params->args[0]; + max_arg = params->args[1]; + x_arg = params->args[2]; + + if (!expr_common_shape(ctx, min_arg->data_type, max_arg->data_type, loc, &common_class, &dimx, &dimy)) + return NULL; + common_type = hlsl_get_numeric_type(ctx, common_class, HLSL_TYPE_FLOAT, dimx, dimy); + + if (!expr_common_shape(ctx, common_type, x_arg->data_type, loc, &common_class, &dimx, &dimy)) + return NULL; + common_type = hlsl_get_numeric_type(ctx, common_class, HLSL_TYPE_FLOAT, dimx, dimy); + + if (!(min_arg = add_implicit_conversion(ctx, params->instrs, min_arg, common_type, loc))) + return NULL; + + if (!(max_arg = add_implicit_conversion(ctx, params->instrs, max_arg, common_type, loc))) + return NULL; + + if (!(x_arg = add_implicit_conversion(ctx, params->instrs, x_arg, common_type, loc))) + return NULL; + + if (!(min_arg = add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_NEG, min_arg, loc))) + return false; + + if (!(p_num = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_ADD, x_arg, min_arg, loc))) + return false; + + if (!(p_denom = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_ADD, max_arg, min_arg, loc))) + return false; + + if (!(one = hlsl_new_float_constant(ctx, 1.0, loc))) + return false; + list_add_tail(params->instrs, &one->node.entry); + + if (!(p_denom = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_DIV, &one->node, p_denom, loc))) + return false; + + if (!(p = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, p_num, p_denom, loc))) + return false; + + if (!(p = add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_SAT, p, loc))) + return false; + + if (!(minus_two = hlsl_new_float_constant(ctx, -2.0, loc))) + return false; + list_add_tail(params->instrs, &minus_two->node.entry); + + if (!(three = hlsl_new_float_constant(ctx, 3.0, loc))) + return false; + list_add_tail(params->instrs, &three->node.entry); + + if (!(res = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, &minus_two->node, p, loc))) + return false; + + if (!(res = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_ADD, &three->node, res, loc))) + return false; + + if (!(p = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, p, p, loc))) + return false; + + if (!(res = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, p, res, loc))) + return false; + + return true; +} + static bool intrinsic_transpose(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { @@ -2681,6 +2757,7 @@ intrinsic_functions[] = {"pow", 2, true, intrinsic_pow}, {"round", 1, true, intrinsic_round}, {"saturate", 1, true, intrinsic_saturate}, + {"smoothstep", 3, true, intrinsic_smoothstep}, {"transpose", 1, true, intrinsic_transpose}, };
diff --git a/tests/hlsl-smoothstep.shader_test b/tests/hlsl-smoothstep.shader_test new file mode 100644 index 00000000..ad53b50a --- /dev/null +++ b/tests/hlsl-smoothstep.shader_test @@ -0,0 +1,169 @@ + + +[pixel shader] +float4 main() : sv_target +{ + float4 a = {1, -1, -1, 10}; + float4 b = {2, 1, 1, 20}; + float4 x = {0.3, 0.4, 2, 15.4}; + + return smoothstep(a, b, x); +} + +[test] +draw quad +probe all rgba (0, 0.784, 1.0, 0.559872) 1 + + +[pixel shader] +float4 main() : sv_target +{ + float a = 1; + float b = 2; + float4 x = {0.9, 1.2, 1.8, 2.1}; + + return smoothstep(a, b, x); +} + +[test] +draw quad +probe all rgba (0, 0.104, 0.896, 1.000000) 5 + + +[pixel shader] +float4 main() : sv_target +{ + float4 a = {1, 10, 100, 1000}; + float4 b = {2, 20, 200, 2000}; + float x = 14; + + return smoothstep(a, b, x); +} + +[test] +draw quad +probe all rgba (1.0, 0.352, 0, 0) 1 + + +[pixel shader] +float4 main() : sv_target +{ + float2 a = {1, 10}; + float3 b = {2, 20, 200}; + float4 x = {1.4, 14, 140, 1400}; + + float2 res = smoothstep(a, b, x); + return float4(res, 0, 0); +} + +[test] +draw quad +probe all rgba (0.352, 0.352, 0, 0) 1 + + +[pixel shader] +float4 main() : sv_target +{ + float3 a = {1, 10, 100}; + float2 b = {2, 20}; + float4 x = {1.4, 14, 140, 1400}; + + float2 res = smoothstep(a, b, x); + return float4(res, 0, 0); +} + +[test] +draw quad +probe all rgba (0.352, 0.352, 0, 0) 1 + + +[pixel shader] +float4 main() : sv_target +{ + float4 a = {1, 10, 100, 1000}; + float4 b = {2, 20, 200, 2000}; + float2 x = {14, 140}; + + float2 res = smoothstep(a, b, x); + return float4(res, 0, 0); +} + +[test] +draw quad +probe all rgba (1.0, 1.0, 0, 0) 1 + + +[pixel shader todo] +float4 main() : sv_target +{ + float2x3 a = {1, 1, 1, 1, 1, 1}; + float3x2 b = {2, 2, 2, 2, 2, 2}; + float4x2 x = {1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8}; + + float2x2 r = smoothstep(a, b, x); + return r; +} + +[test] +todo draw quad +todo probe all rgba (0.028, 0.104, 0.216, 0.352) 1 + + +[pixel shader] +// 4 division by zero warnings. +float4 main() : sv_target +{ + float4 a = {0, 0, 0, 0}; + float4 b = {-1, -1, 0, 0}; + float4 x = {0, -0.25, 0, 1}; + + return smoothstep(a, b, x); +} + +[test] +draw quad +probe all rgba (0, 0.156250, 0, 1.000000) + + +[pixel shader] +float4 main() : sv_target +{ + float4x1 a = {0.0, 0.0, 0.0, 0.0}; + float b = 1.0; + float3x1 x = {0.5, 0.5, 0.5}; + + float3x1 r = smoothstep(a, b, x); + return float4(r, 0); +} + +[test] +draw quad +probe all rgba (0.5, 0.5, 0.5, 0.0) + + +[pixel shader todo] +float4 main() : sv_target +{ + float4x1 a = {0.0, 0.0, 0.0, 0.0}; + float2x2 b = {1.0, 1.0, 1.0, 1.0}; + float3x1 x = {0.5, 0.5, 0.5}; + + float2x1 r = smoothstep(a, b, x); + return float4(r, r); +} + +[test] +todo draw quad +todo probe all rgba (0.5, 0.5, 0.5, 0.5) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {0.0, 0.0, 0.0, 0.0}; + float4 b = 1.0; + float2x2 x = {0.5, 0.5, 0.5, 0.5}; + + smoothstep(a, b, x); + return 0; +}
From: Francisco Casas fcasas@codeweavers.com
Some intrinsics have different rules for the allowed data types than expressions:
- Vectors and matrices at the same time are not allowed, regardless of their dimensions. Even if they have the same number of components.
- Any combination of matrices is always allowed, even those when no matrix fits inside another, e.g.: float2x3 is compatible with float3x2, resulting in float 2x2. The common data type is the min on each dimension.
This is the case for max, pow, ldexp, clamp and smoothstep; which suggest that it is the case for all intrinsics where the operation is applied element-wise.
Tests for mul() are also added as a counter-example where the operation is not element-wise. --- tests/hlsl-clamp.shader_test | 28 ++++++++++++++++++++++++++++ tests/hlsl-ldexp.shader_test | 26 ++++++++++++++++++++++++++ tests/hlsl-lerp.shader_test | 28 ++++++++++++++++++++++++++++ tests/hlsl-mul.shader_test | 30 ++++++++++++++++++++++++++++++ tests/max.shader_test | 27 +++++++++++++++++++++++++++ tests/pow.shader_test | 26 ++++++++++++++++++++++++++ 6 files changed, 165 insertions(+)
diff --git a/tests/hlsl-clamp.shader_test b/tests/hlsl-clamp.shader_test index 8e26270c..cc198735 100644 --- a/tests/hlsl-clamp.shader_test +++ b/tests/hlsl-clamp.shader_test @@ -8,3 +8,31 @@ float4 main(uniform float3 u) : sv_target uniform 0 float4 -0.3 -0.1 0.7 0.0 draw quad probe all rgba (-0.1, 0.7, -0.3, 0.3) + + +[pixel shader todo] +float4 main() : sv_target +{ + float3x2 a = {6, 5, 4, 3, 2, 1}; + float2x3 b = {1, 2, 3, 4.2, 5.2, 6.2}; + float3x4 c = 5.5; + + float2x2 r = clamp(a, b, c); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (5.5, 5.0, 4.2, 5.2) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {3.1, 3.1, 3.1, 3.1}; + float2x2 b = {1, 2, 3, 4}; + float4 c = {5.5, 4.5, 3.5, 2.5}; + + clamp(a, b, c); + return 0; +} diff --git a/tests/hlsl-ldexp.shader_test b/tests/hlsl-ldexp.shader_test index 0873fc9e..bea97953 100644 --- a/tests/hlsl-ldexp.shader_test +++ b/tests/hlsl-ldexp.shader_test @@ -30,3 +30,29 @@ uniform 0 int4 2 3 4 5 uniform 4 int4 0 -10 10 100 draw quad probe all rgba (2.0, 0.00292968750, 4096.0, 6.33825300e+030) + + +[pixel shader todo] +float4 main() : sv_target +{ + float2x3 a = {1, 2, 3, 4, 5, 6}; + float3x2 b = {6, 5, 4, 3, 2, 1}; + + float2x2 r = ldexp(a, b); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (64.0, 64.0, 64.0, 40.0) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {1, 2, 3, 4}; + float1 b = {2}; + + ldexp(a, b); + return 0; +} diff --git a/tests/hlsl-lerp.shader_test b/tests/hlsl-lerp.shader_test index 3f93b02d..3cd10ec1 100644 --- a/tests/hlsl-lerp.shader_test +++ b/tests/hlsl-lerp.shader_test @@ -34,3 +34,31 @@ uniform 4 int4 0 -10 10 1000000 uniform 8 int4 0 1 -1 1000000 draw quad probe all rgba (2.0, -10.0, -2.0, 1e12) + + +[pixel shader todo] +float4 main() : sv_target +{ + float3x2 a = {6, 5, 4, 3, 2, 1}; + float2x3 b = {1, 2, 3, 4.2, 5.2, 6.2}; + float3x4 c = 2.4; + + float2x2 r = lerp(a, b, c); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (-6.0, -2.2, 4.48, 8.28) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {0, 1, 2, 3}; + float2x2 b = {1, 2, 3, 4}; + float4 c = {0.5, 0.5, 0.5, 0.5}; + + lerp(a, b, c); + return 0; +} diff --git a/tests/hlsl-mul.shader_test b/tests/hlsl-mul.shader_test index 7b453187..cb104a9e 100644 --- a/tests/hlsl-mul.shader_test +++ b/tests/hlsl-mul.shader_test @@ -288,3 +288,33 @@ float4 main(float4 pos : sv_position) : sv_target [test] draw quad probe all rgba (78.0, 96.0, 114.0, 0.0) + + +[pixel shader] +float4 main() : sv_target +{ + float2x3 a = float2x3(1, 2, 3, 4, 5, 6); + float3x2 b = float3x2(6, 5, 4, 3, 2, 1); + + float2x2 r = mul(a, b); + return float4(r); +} + +[test] +draw quad +probe all rgba (20.0, 14.0, 56.0, 41.0) + + +[pixel shader] +float4 main() : sv_target +{ + float2x2 a = float2x2(1, 2, 3, 4); + float2 b = float2(1, 2); + + float2 r = mul(a, b); + return float4(r, 0, 0); +} + +[test] +draw quad +probe all rgba (5.0, 11.0, 0.0, 0.0) diff --git a/tests/max.shader_test b/tests/max.shader_test index 50083f33..7a917ec5 100644 --- a/tests/max.shader_test +++ b/tests/max.shader_test @@ -9,6 +9,7 @@ uniform 0 float4 0.7 -0.1 0.0 0.0 draw quad probe all rgba (0.7, 2.1, 2.0, -1.0)
+ [pixel shader] float4 main(uniform float4 u) : sv_target { @@ -21,3 +22,29 @@ float4 main(uniform float4 u) : sv_target uniform 0 float4 0.7 -0.1 0.4 0.8 draw quad probe all rgba (0.7, 0.8, 0.7, 0.2) + + +[pixel shader todo] +float4 main() : sv_target +{ + float2x3 a = {1, 2, 3, 4, 5, 6}; + float3x2 b = {6, 5, 4, 3, 2, 1}; + + float2x2 r = max(a, b); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (6.0, 5.0, 4.0, 5.0) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {1, 2, 3, 4}; + float4 b = {4, 3, 2, 1}; + + max(a, b); + return 0; +} diff --git a/tests/pow.shader_test b/tests/pow.shader_test index 6470494e..6f2b2741 100644 --- a/tests/pow.shader_test +++ b/tests/pow.shader_test @@ -8,3 +8,29 @@ float4 main(uniform float4 u) : sv_target uniform 0 float4 0.4 0.8 2.5 2.0 draw quad probe all rgba (0.512, 0.101192884, 0.64, 0.25) 4 + + +[pixel shader todo] +float4 main() : sv_target +{ + float2x3 a = {1, 2, 3, 4, 5, 6}; + float3x2 b = {6, 5, 4, 3, 2, 1}; + + float2x2 r = pow(a, b); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (1.0, 32.0, 256.0, 125.0) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {1, 2, 3, 4}; + float4 b = {1, 2, 3, 4}; + + pow(a, b); + return 0; +}
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl.y | 80 +++++++++++++++++++++++++++++++ tests/hlsl-clamp.shader_test | 8 ++-- tests/hlsl-ldexp.shader_test | 8 ++-- tests/hlsl-lerp.shader_test | 8 ++-- tests/hlsl-smoothstep.shader_test | 14 +++--- tests/max.shader_test | 8 ++-- tests/pow.shader_test | 2 +- 7 files changed, 104 insertions(+), 24 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 7ac1dd7e..a56833c6 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2225,6 +2225,65 @@ static struct hlsl_ir_node *intrinsic_float_convert_arg(struct hlsl_ctx *ctx, return add_implicit_conversion(ctx, params->instrs, arg, type, loc); }
+static bool elementwise_intrinsic_convert_args(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + enum hlsl_base_type base = params->args[0]->data_type->base_type; + bool vectors = false, matrices = false; + unsigned int dimx = 4, dimy = 4; + struct hlsl_type *common_type; + unsigned int i; + + for (i = 0; i < params->args_count; ++i) + { + struct hlsl_type *arg_type = params->args[i]->data_type; + + base = expr_common_base_type(base, arg_type->base_type); + + if (arg_type->type == HLSL_CLASS_VECTOR) + { + vectors = true; + dimx = min(dimx, arg_type->dimx); + } + else if (arg_type->type == HLSL_CLASS_MATRIX) + { + matrices = true; + dimx = min(dimx, arg_type->dimx); + dimy = min(dimy, arg_type->dimy); + } + } + + if (matrices && vectors) + { + hlsl_error(ctx, loc, VKD3D_SHADER_ERROR_HLSL_INVALID_TYPE, + "Cannot use both matrices and vectors in an elementwise intrinsic."); + return false; + } + else if (matrices) + { + common_type = hlsl_get_matrix_type(ctx, base, dimx, dimy); + } + else if (vectors) + { + common_type = hlsl_get_vector_type(ctx, base, dimx); + } + else + { + common_type = hlsl_get_scalar_type(ctx, base); + } + + for (i = 0; i < params->args_count; ++i) + { + struct hlsl_ir_node *new_arg; + + if (!(new_arg = add_implicit_conversion(ctx, params->instrs, params->args[i], common_type, loc))) + return NULL; + params->args[i] = new_arg; + } + + return true; +} + static bool intrinsic_abs(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { @@ -2280,6 +2339,9 @@ static bool intrinsic_clamp(struct hlsl_ctx *ctx, { struct hlsl_ir_node *max;
+ if (!elementwise_intrinsic_convert_args(ctx, params, loc)) + return false; + if (!(max = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MAX, params->args[0], params->args[1], loc))) return false;
@@ -2361,6 +2423,9 @@ static bool intrinsic_ldexp(struct hlsl_ctx *ctx, { struct hlsl_ir_node *arg;
+ if (!elementwise_intrinsic_convert_args(ctx, params, loc)) + return false; + if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[1], loc))) return false;
@@ -2400,6 +2465,9 @@ static bool intrinsic_lerp(struct hlsl_ctx *ctx, { struct hlsl_ir_node *arg, *neg, *add, *mul;
+ if (!elementwise_intrinsic_convert_args(ctx, params, loc)) + return false; + if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[0], loc))) return false;
@@ -2418,12 +2486,18 @@ static bool intrinsic_lerp(struct hlsl_ctx *ctx, static bool intrinsic_max(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { + if (!elementwise_intrinsic_convert_args(ctx, params, loc)) + return false; + return !!add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MAX, params->args[0], params->args[1], loc); }
static bool intrinsic_min(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { + if (!elementwise_intrinsic_convert_args(ctx, params, loc)) + return false; + return !!add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MIN, params->args[0], params->args[1], loc); }
@@ -2558,6 +2632,9 @@ static bool intrinsic_pow(struct hlsl_ctx *ctx, { struct hlsl_ir_node *log, *exp, *arg, *mul;
+ if (!elementwise_intrinsic_convert_args(ctx, params, loc)) + return false; + if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[0], loc))) return false;
@@ -2606,6 +2683,9 @@ static bool intrinsic_smoothstep(struct hlsl_ctx *ctx, struct hlsl_type *common_type; unsigned int dimx, dimy;
+ if (!elementwise_intrinsic_convert_args(ctx, params, loc)) + return false; + min_arg = params->args[0]; max_arg = params->args[1]; x_arg = params->args[2]; diff --git a/tests/hlsl-clamp.shader_test b/tests/hlsl-clamp.shader_test index cc198735..1320c3dd 100644 --- a/tests/hlsl-clamp.shader_test +++ b/tests/hlsl-clamp.shader_test @@ -10,7 +10,7 @@ draw quad probe all rgba (-0.1, 0.7, -0.3, 0.3)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float3x2 a = {6, 5, 4, 3, 2, 1}; @@ -22,11 +22,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (5.5, 5.0, 4.2, 5.2) +draw quad +probe all rgba (5.5, 5.0, 4.2, 5.2)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {3.1, 3.1, 3.1, 3.1}; diff --git a/tests/hlsl-ldexp.shader_test b/tests/hlsl-ldexp.shader_test index bea97953..92988d37 100644 --- a/tests/hlsl-ldexp.shader_test +++ b/tests/hlsl-ldexp.shader_test @@ -32,7 +32,7 @@ draw quad probe all rgba (2.0, 0.00292968750, 4096.0, 6.33825300e+030)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float2x3 a = {1, 2, 3, 4, 5, 6}; @@ -43,11 +43,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (64.0, 64.0, 64.0, 40.0) +draw quad +probe all rgba (64.0, 64.0, 64.0, 40.0)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {1, 2, 3, 4}; diff --git a/tests/hlsl-lerp.shader_test b/tests/hlsl-lerp.shader_test index 3cd10ec1..15e90cef 100644 --- a/tests/hlsl-lerp.shader_test +++ b/tests/hlsl-lerp.shader_test @@ -36,7 +36,7 @@ draw quad probe all rgba (2.0, -10.0, -2.0, 1e12)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float3x2 a = {6, 5, 4, 3, 2, 1}; @@ -48,11 +48,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (-6.0, -2.2, 4.48, 8.28) +draw quad +probe all rgba (-6.0, -2.2, 4.48, 8.28) 1
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {0, 1, 2, 3}; diff --git a/tests/hlsl-smoothstep.shader_test b/tests/hlsl-smoothstep.shader_test index ad53b50a..6dadfa8e 100644 --- a/tests/hlsl-smoothstep.shader_test +++ b/tests/hlsl-smoothstep.shader_test @@ -93,7 +93,7 @@ draw quad probe all rgba (1.0, 1.0, 0, 0) 1
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float2x3 a = {1, 1, 1, 1, 1, 1}; @@ -105,8 +105,8 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (0.028, 0.104, 0.216, 0.352) 1 +draw quad +probe all rgba (0.028, 0.104, 0.216, 0.352) 6
[pixel shader] @@ -141,7 +141,7 @@ draw quad probe all rgba (0.5, 0.5, 0.5, 0.0)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float4x1 a = {0.0, 0.0, 0.0, 0.0}; @@ -153,11 +153,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (0.5, 0.5, 0.5, 0.5) +draw quad +probe all rgba (0.5, 0.5, 0.5, 0.5)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {0.0, 0.0, 0.0, 0.0}; diff --git a/tests/max.shader_test b/tests/max.shader_test index 7a917ec5..3a5c3125 100644 --- a/tests/max.shader_test +++ b/tests/max.shader_test @@ -24,7 +24,7 @@ draw quad probe all rgba (0.7, 0.8, 0.7, 0.2)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float2x3 a = {1, 2, 3, 4, 5, 6}; @@ -35,11 +35,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (6.0, 5.0, 4.0, 5.0) +draw quad +probe all rgba (6.0, 5.0, 4.0, 5.0)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {1, 2, 3, 4}; diff --git a/tests/pow.shader_test b/tests/pow.shader_test index 6f2b2741..0c1d7de3 100644 --- a/tests/pow.shader_test +++ b/tests/pow.shader_test @@ -25,7 +25,7 @@ todo draw quad todo probe all rgba (1.0, 32.0, 256.0, 125.0)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {1, 2, 3, 4};
From: Francisco Casas fcasas@codeweavers.com
Using add_unary_arithmetic_expr() instead of hlsl_new_unary_expr() allows the intrinsic to work with matrices.
Otherwise we get:
E5017: Aborting due to not yet implemented feature: Copying from unsupported node type.
because an HLSL_IR_EXPR reaches split_matrix_copies(). --- libs/vkd3d-shader/hlsl.y | 7 +++---- tests/pow.shader_test | 6 +++--- 2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index a56833c6..0b60b9b9 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2638,16 +2638,15 @@ static bool intrinsic_pow(struct hlsl_ctx *ctx, if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[0], loc))) return false;
- if (!(log = hlsl_new_unary_expr(ctx, HLSL_OP1_LOG2, arg, *loc))) + if (!(log = add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_LOG2, arg, loc))) return false; - list_add_tail(params->instrs, &log->entry);
if (!(mul = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, params->args[1], log, loc))) return false;
- if (!(exp = hlsl_new_unary_expr(ctx, HLSL_OP1_EXP2, mul, *loc))) + if (!(exp = add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_EXP2, mul, loc))) return false; - list_add_tail(params->instrs, &exp->entry); + return true; }
diff --git a/tests/pow.shader_test b/tests/pow.shader_test index 0c1d7de3..1bb3bd94 100644 --- a/tests/pow.shader_test +++ b/tests/pow.shader_test @@ -10,7 +10,7 @@ draw quad probe all rgba (0.512, 0.101192884, 0.64, 0.25) 4
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float2x3 a = {1, 2, 3, 4, 5, 6}; @@ -21,8 +21,8 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (1.0, 32.0, 256.0, 125.0) +draw quad +probe all rgba (1.0, 32.0, 256.0, 125.0) 2
[pixel shader fail]
On Mon Nov 28 18:02:45 2022 +0000, Francisco Casas wrote:
hmm, that name may be a little misleading because this function doesn't retrieve a type, but instead **converts** the parameters to the common type. Though I agree that a shorter name would be better.
Oh, you're right, I wasn't reading carefully enough. But yeah, elementwise_intrinsic_convert_args() seems reasonable enough to me.
On Mon Nov 28 17:56:57 2022 +0000, Francisco Casas wrote:
Mathematically yes, but the native compiler does a `1/(·)` first, and then uses a `mul_sat`. So it made sense to me to leave it as DIV + MUL + SAT, just in case we ever want to merge the MUL and the SAT.
Interesting. sm5 won't optimize a*(1/b) to a/b, or vice versa. I don't know why, but I guess if there's a difference in precision we should probably emulate native...? @Mystral, do you have thoughts on this?
Note that _sat is just a modifier which saturates the destination after performing whatever opcode. I.e. in theory this could be a single div_sat. No idea why native doesn't do that.
This merge request was approved by Zebediah Figura.
Giovanni Mascellani (@giomasce) commented about tests/hlsl-smoothstep.shader_test:
+[pixel shader] +// 4 division by zero warnings. +float4 main() : sv_target +{
- float4 a = {0, 0, 0, 0};
- float4 b = {-1, -1, 0, 0};
- float4 x = {0, -0.25, 0, 1};
- return smoothstep(a, b, x);
+}
+[test] +draw quad +probe all rgba (0, 0.156250, 0, 1.000000)
With my iGPU (`Driver name: Intel open-source Mesa driver, driver info: Mesa 22.3.0.`) this fails with: ``` shader_runner:551:Section [test], line 123: Test failed: Got {0.00000000e+00, 1.56250000e-01, 0.00000000e+00, 0.00000000e+00}, expected {0.00000000e+00, 1.56250000e-01, 0.00000000e+00, 1.00000000e+00} at (0, 0). ```
It's working with any other implementation I've access to (including NVidia proprietary, RADV and llvmpipe), so I guess it's a bug in the Intel driver. It's probably not enough to delay this patch, I'll try to debug it better as soon as I have some time. Just wanted to let you know.
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl.y:
return NULL; if (!(x_arg = add_implicit_conversion(ctx, params->instrs, x_arg, common_type, loc))) return NULL;
Shouldn't you remove this part now? `elementwise_intrinsic_convert_args()` should have already taken care of everything, or am I missing something?
On Mon Dec 12 16:36:55 2022 +0000, Giovanni Mascellani wrote:
Shouldn't you remove this part now? `elementwise_intrinsic_convert_args()` should have already taken care of everything, or am I missing something?
`elementwise_intrinsic_convert_args()` converts all arguments to the same common type by means of:
* Ensuring that matrices and vectors are not used at the same time as arguments. * Truncating vector (or matrix) arguments to the common shape. * Converting all arguments base type to the common base type.
without knowing of the intrinsic where it is used.
So it may be the case that this common type has base-type `int` if all the provided arguments have base-type `int`.
I did it this way because it is my understanding that not all intrinsic functions always return a `float` if all arguments are `int`, so it is responsibility of the function to do this conversion.
This often can be done converting the base-type of a single argument to float and let the uses of `add_binary_arithmetic_expr` propagate this conversion. It gets a little more complicated with intrinsics that take 3 arguments though such as `smoothstep()`, though.
It seems that indeed I can remove the calls to `expr_common_shape()` and some of the calls to `add_implicit_conversion()` here, or add a `bool convert_to_float` argument to `elementwise_intrinsic_convert_args()`. I will give it more thought, maybe I can write a shader to test where in the implementation of `smoothstep()` the conversion of `int` args to float takes place.
On Mon Dec 12 16:36:54 2022 +0000, Giovanni Mascellani wrote:
With my iGPU (`Driver name: Intel open-source Mesa driver, driver info: Mesa 22.3.0.`) this fails with:
shader_runner:551:Section [test], line 123: Test failed: Got {0.00000000e+00, 1.56250000e-01, 0.00000000e+00, 0.00000000e+00}, expected {0.00000000e+00, 1.56250000e-01, 0.00000000e+00, 1.00000000e+00} at (0, 0).
It's working with any other implementation I've access to (including NVidia proprietary, RADV and llvmpipe), so I guess it's a bug in the Intel driver. It's probably not enough to delay this patch, I'll try to debug it better as soon as I have some time. Just wanted to let you know.
Hmm, it seems that division by zero is indeed implementation-dependent.
https://stackoverflow.com/questions/5802351/what-happens-when-you-divide-by-...
I better just remove this test, lest we need to keep track of flaky tests as well.
On Tue Dec 13 19:03:45 2022 +0000, Francisco Casas wrote:
`elementwise_intrinsic_convert_args()` converts all arguments to the same common type by means of:
- Ensuring that matrices and vectors are not used at the same time as arguments.
- Truncating vector (or matrix) arguments to the common shape.
- Converting all arguments base type to the common base type.
without knowing of the intrinsic where it is used. So it may be the case that this common type has base-type `int` if all the provided arguments have base-type `int`. I did it this way because it is my understanding that not all intrinsic functions always return a `float` if all arguments are `int`, so it is responsibility of the function to do this conversion. This often can be done converting the base-type of a single argument to float and let the uses of `add_binary_arithmetic_expr` propagate this conversion. It gets a little more complicated with intrinsics that take 3 arguments though such as `smoothstep()`, though. It seems that indeed I can remove the calls to `expr_common_shape()` and some of the calls to `add_implicit_conversion()` here, or add a `bool convert_to_float` argument to `elementwise_intrinsic_convert_args()`. I will give it more thought, maybe I can write a shader to test where in the implementation of `smoothstep()` the conversion of `int` args to float takes place.
In the end I went for adding the `bool convert_to_float` to `elementwise_intrinsic_convert_args()`.
Also, I checked the native compiler's output on all the intrinsics that use this function, while passing only uniform int args, to see if there is an `itof` that converts all arguments to `float`, to know when to set this `convert_to_float` to true.