This patch includes an implementation of the long-pending `transpose` intrinsic and the `smoothstep` intrinsic.
While implementing `smoothstep` I realized that some intrinsics have different rules for the allowed data types than expressions:
- Vectors and matrices at the same time are not allowed, regardless of their dimensions. Even if they have the same number of components.
- Any combination of matrices is always allowed, even those when no matrix fits inside another, e.g.:
`float2x3` is compatible with `float3x2`, resulting in `float2x2`.
The common data type is the min on each dimension.
This is the case for `max`, `pow`, `ldexp`, `clamp` and `smoothstep`; which suggest that it is the case for all intrinsics where the operation is applied element-wise. So this was corrected.
A minor fix in `pow`'s type conversion is also included.
From: Francisco Casas fcasas@codeweavers.com
--- Makefile.am | 1 + libs/vkd3d-shader/hlsl.y | 59 +++++++++++++++++++++++++ tests/hlsl-transpose.shader_test | 75 ++++++++++++++++++++++++++++++++ 3 files changed, 135 insertions(+) create mode 100644 tests/hlsl-transpose.shader_test
diff --git a/Makefile.am b/Makefile.am index 85cd4642..d1f6ec6b 100644 --- a/Makefile.am +++ b/Makefile.am @@ -117,6 +117,7 @@ vkd3d_shader_tests = \ tests/hlsl-struct-array.shader_test \ tests/hlsl-struct-assignment.shader_test \ tests/hlsl-struct-semantics.shader_test \ + tests/hlsl-transpose.shader_test \ tests/hlsl-vector-indexing.shader_test \ tests/hlsl-vector-indexing-uniform.shader_test \ tests/logic-operations.shader_test \ diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index eedc85bd..b834b230 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2596,6 +2596,64 @@ static bool intrinsic_saturate(struct hlsl_ctx *ctx, return !!add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_SAT, arg, loc); }
+static bool intrinsic_transpose(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_node *arg = params->args[0]; + struct hlsl_type *arg_type = arg->data_type; + struct hlsl_deref var_deref; + struct hlsl_type *mat_type; + struct hlsl_ir_load *load; + struct hlsl_ir_var *var; + unsigned int i, j; + + if (arg_type->type != HLSL_CLASS_SCALAR && arg_type->type != HLSL_CLASS_MATRIX) + { + struct vkd3d_string_buffer *string; + + if ((string = hlsl_type_to_string(ctx, arg_type))) + hlsl_error(ctx, &arg->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_TYPE, + "Wrong type for argument 1 of transpose(): expected a matrix or scalar type, but got '%s'.\n", + string->buffer); + hlsl_release_string_buffer(ctx, string); + return false; + } + + if (arg_type->type == HLSL_CLASS_SCALAR) + { + list_add_tail(params->instrs, &arg->entry); + return true; + } + + mat_type = hlsl_get_matrix_type(ctx, arg_type->base_type, arg_type->dimy, arg_type->dimx); + + if (!(var = hlsl_new_synthetic_var(ctx, "transpose", mat_type, loc))) + return NULL; + hlsl_init_simple_deref_from_var(&var_deref, var); + + for (i = 0; i < arg_type->dimx; ++i) + { + for (j = 0; j < arg_type->dimy; ++j) + { + struct hlsl_ir_store *store; + struct hlsl_block block; + + if (!(load = add_load_component(ctx, params->instrs, arg, j * arg->data_type->dimx + i, loc))) + return false; + + if (!(store = hlsl_new_store_component(ctx, &block, &var_deref, i * var->data_type->dimx + j, &load->node))) + return false; + list_move_tail(params->instrs, &block.instrs); + } + } + + if (!(load = hlsl_new_var_load(ctx, var, *loc))) + return false; + list_add_tail(params->instrs, &load->node.entry); + + return true; +} + static const struct intrinsic_function { const char *name; @@ -2623,6 +2681,7 @@ intrinsic_functions[] = {"pow", 2, true, intrinsic_pow}, {"round", 1, true, intrinsic_round}, {"saturate", 1, true, intrinsic_saturate}, + {"transpose", 1, true, intrinsic_transpose}, };
static int intrinsic_function_name_compare(const void *a, const void *b) diff --git a/tests/hlsl-transpose.shader_test b/tests/hlsl-transpose.shader_test new file mode 100644 index 00000000..83852fa1 --- /dev/null +++ b/tests/hlsl-transpose.shader_test @@ -0,0 +1,75 @@ +[pixel shader] +float4 main() : sv_target +{ + return transpose(5); +} + +[test] +draw quad +probe all rgba (5.0, 5.0, 5.0, 5.0) + + +[pixel shader] +float4 main() : sv_target +{ + float1x1 x = 5; + + return transpose(x); +} + +[test] +draw quad +probe all rgba (5.0, 5.0, 5.0, 5.0) + + +[pixel shader fail] +float4 main() : sv_target +{ + float4 x = float4(1, 2, 3, 4); + + return transpose(x); +} + +[pixel shader] +float4 main() : sv_target +{ + float1x4 x = float1x4(1.0, 2.0, 3.0, 4.0); + + return transpose(x); +} + +[test] +draw quad +probe all rgba (1.0, 2.0, 3.0, 4.0) + + +[pixel shader] +float4 main() : sv_target +{ + float4x3 m = float4x3(1.0, 2.0, 3.0, + 4.0, 5.0, 6.0, + 7.0, 8.0, 9.0, + 10.0, 11.0, 12.0); + + return transpose(m)[1]; +} + +[test] +draw quad +probe all rgba (2.0, 5.0, 8.0, 11.0) + + +[pixel shader] +float4 main() : sv_target +{ + row_major float4x3 m = float4x3(1.0, 2.0, 3.0, + 4.0, 5.0, 6.0, + 7.0, 8.0, 9.0, + 10.0, 11.0, 12.0); + + return transpose(m)[1]; +} + +[test] +draw quad +probe all rgba (2.0, 5.0, 8.0, 11.0)
From: Francisco Casas fcasas@codeweavers.com
--- Makefile.am | 1 + libs/vkd3d-shader/hlsl.y | 77 ++++++++++++++ tests/hlsl-smoothstep.shader_test | 169 ++++++++++++++++++++++++++++++ 3 files changed, 247 insertions(+) create mode 100644 tests/hlsl-smoothstep.shader_test
diff --git a/Makefile.am b/Makefile.am index d1f6ec6b..da7fd71f 100644 --- a/Makefile.am +++ b/Makefile.am @@ -114,6 +114,7 @@ vkd3d_shader_tests = \ tests/hlsl-state-block-syntax.shader_test \ tests/hlsl-static-initializer.shader_test \ tests/hlsl-storage-qualifiers.shader_test \ + tests/hlsl-smoothstep.shader_test \ tests/hlsl-struct-array.shader_test \ tests/hlsl-struct-assignment.shader_test \ tests/hlsl-struct-semantics.shader_test \ diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index b834b230..4a551724 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2596,6 +2596,82 @@ static bool intrinsic_saturate(struct hlsl_ctx *ctx, return !!add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_SAT, arg, loc); }
+/* smoothstep(a, b, x) = saturate(p^2 (3 - 2p)), where p = (x - a)/(b - a) */ +static bool intrinsic_smoothstep(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_node *min_arg, *max_arg, *x_arg, *p, *p_num, *p_denom, *res; + struct hlsl_ir_constant *one, *minus_two, *three; + enum hlsl_type_class common_class; + struct hlsl_type *common_type; + unsigned int dimx, dimy; + + min_arg = params->args[0]; + max_arg = params->args[1]; + x_arg = params->args[2]; + + if (!expr_common_shape(ctx, min_arg->data_type, max_arg->data_type, loc, &common_class, &dimx, &dimy)) + return NULL; + common_type = hlsl_get_numeric_type(ctx, common_class, HLSL_TYPE_FLOAT, dimx, dimy); + + if (!expr_common_shape(ctx, common_type, x_arg->data_type, loc, &common_class, &dimx, &dimy)) + return NULL; + common_type = hlsl_get_numeric_type(ctx, common_class, HLSL_TYPE_FLOAT, dimx, dimy); + + if (!(min_arg = add_implicit_conversion(ctx, params->instrs, min_arg, common_type, loc))) + return NULL; + + if (!(max_arg = add_implicit_conversion(ctx, params->instrs, max_arg, common_type, loc))) + return NULL; + + if (!(x_arg = add_implicit_conversion(ctx, params->instrs, x_arg, common_type, loc))) + return NULL; + + if (!(min_arg = add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_NEG, min_arg, loc))) + return false; + + if (!(p_num = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_ADD, x_arg, min_arg, loc))) + return false; + + if (!(p_denom = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_ADD, max_arg, min_arg, loc))) + return false; + + if (!(one = hlsl_new_float_constant(ctx, 1.0, loc))) + return false; + list_add_tail(params->instrs, &one->node.entry); + + if (!(p_denom = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_DIV, &one->node, p_denom, loc))) + return false; + + if (!(p = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, p_num, p_denom, loc))) + return false; + + if (!(p = add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_SAT, p, loc))) + return false; + + if (!(minus_two = hlsl_new_float_constant(ctx, -2.0, loc))) + return false; + list_add_tail(params->instrs, &minus_two->node.entry); + + if (!(three = hlsl_new_float_constant(ctx, 3.0, loc))) + return false; + list_add_tail(params->instrs, &three->node.entry); + + if (!(res = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, &minus_two->node, p, loc))) + return false; + + if (!(res = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_ADD, &three->node, res, loc))) + return false; + + if (!(p = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, p, p, loc))) + return false; + + if (!(res = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, p, res, loc))) + return false; + + return true; +} + static bool intrinsic_transpose(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { @@ -2681,6 +2757,7 @@ intrinsic_functions[] = {"pow", 2, true, intrinsic_pow}, {"round", 1, true, intrinsic_round}, {"saturate", 1, true, intrinsic_saturate}, + {"smoothstep", 3, true, intrinsic_smoothstep}, {"transpose", 1, true, intrinsic_transpose}, };
diff --git a/tests/hlsl-smoothstep.shader_test b/tests/hlsl-smoothstep.shader_test new file mode 100644 index 00000000..ad53b50a --- /dev/null +++ b/tests/hlsl-smoothstep.shader_test @@ -0,0 +1,169 @@ + + +[pixel shader] +float4 main() : sv_target +{ + float4 a = {1, -1, -1, 10}; + float4 b = {2, 1, 1, 20}; + float4 x = {0.3, 0.4, 2, 15.4}; + + return smoothstep(a, b, x); +} + +[test] +draw quad +probe all rgba (0, 0.784, 1.0, 0.559872) 1 + + +[pixel shader] +float4 main() : sv_target +{ + float a = 1; + float b = 2; + float4 x = {0.9, 1.2, 1.8, 2.1}; + + return smoothstep(a, b, x); +} + +[test] +draw quad +probe all rgba (0, 0.104, 0.896, 1.000000) 5 + + +[pixel shader] +float4 main() : sv_target +{ + float4 a = {1, 10, 100, 1000}; + float4 b = {2, 20, 200, 2000}; + float x = 14; + + return smoothstep(a, b, x); +} + +[test] +draw quad +probe all rgba (1.0, 0.352, 0, 0) 1 + + +[pixel shader] +float4 main() : sv_target +{ + float2 a = {1, 10}; + float3 b = {2, 20, 200}; + float4 x = {1.4, 14, 140, 1400}; + + float2 res = smoothstep(a, b, x); + return float4(res, 0, 0); +} + +[test] +draw quad +probe all rgba (0.352, 0.352, 0, 0) 1 + + +[pixel shader] +float4 main() : sv_target +{ + float3 a = {1, 10, 100}; + float2 b = {2, 20}; + float4 x = {1.4, 14, 140, 1400}; + + float2 res = smoothstep(a, b, x); + return float4(res, 0, 0); +} + +[test] +draw quad +probe all rgba (0.352, 0.352, 0, 0) 1 + + +[pixel shader] +float4 main() : sv_target +{ + float4 a = {1, 10, 100, 1000}; + float4 b = {2, 20, 200, 2000}; + float2 x = {14, 140}; + + float2 res = smoothstep(a, b, x); + return float4(res, 0, 0); +} + +[test] +draw quad +probe all rgba (1.0, 1.0, 0, 0) 1 + + +[pixel shader todo] +float4 main() : sv_target +{ + float2x3 a = {1, 1, 1, 1, 1, 1}; + float3x2 b = {2, 2, 2, 2, 2, 2}; + float4x2 x = {1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8}; + + float2x2 r = smoothstep(a, b, x); + return r; +} + +[test] +todo draw quad +todo probe all rgba (0.028, 0.104, 0.216, 0.352) 1 + + +[pixel shader] +// 4 division by zero warnings. +float4 main() : sv_target +{ + float4 a = {0, 0, 0, 0}; + float4 b = {-1, -1, 0, 0}; + float4 x = {0, -0.25, 0, 1}; + + return smoothstep(a, b, x); +} + +[test] +draw quad +probe all rgba (0, 0.156250, 0, 1.000000) + + +[pixel shader] +float4 main() : sv_target +{ + float4x1 a = {0.0, 0.0, 0.0, 0.0}; + float b = 1.0; + float3x1 x = {0.5, 0.5, 0.5}; + + float3x1 r = smoothstep(a, b, x); + return float4(r, 0); +} + +[test] +draw quad +probe all rgba (0.5, 0.5, 0.5, 0.0) + + +[pixel shader todo] +float4 main() : sv_target +{ + float4x1 a = {0.0, 0.0, 0.0, 0.0}; + float2x2 b = {1.0, 1.0, 1.0, 1.0}; + float3x1 x = {0.5, 0.5, 0.5}; + + float2x1 r = smoothstep(a, b, x); + return float4(r, r); +} + +[test] +todo draw quad +todo probe all rgba (0.5, 0.5, 0.5, 0.5) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {0.0, 0.0, 0.0, 0.0}; + float4 b = 1.0; + float2x2 x = {0.5, 0.5, 0.5, 0.5}; + + smoothstep(a, b, x); + return 0; +}
From: Francisco Casas fcasas@codeweavers.com
Some intrinsics have different rules for the allowed data types than expressions:
- Vectors and matrices at the same time are not allowed, regardless of their dimensions. Even if they have the same number of components.
- Any combination of matrices is always allowed, even those when no matrix fits inside another, e.g.: float2x3 is compatible with float3x2, resulting in float 2x2. The common data type is the min on each dimension.
This is the case for max, pow, ldexp, clamp and smoothstep; which suggest that it is the case for all intrinsics where the operation is applied element-wise.
Tests for mul() are also added as a counter-example where the operation is not element-wise. --- tests/hlsl-clamp.shader_test | 28 ++++++++++++++++++++++++++++ tests/hlsl-ldexp.shader_test | 26 ++++++++++++++++++++++++++ tests/hlsl-lerp.shader_test | 28 ++++++++++++++++++++++++++++ tests/hlsl-mul.shader_test | 30 ++++++++++++++++++++++++++++++ tests/max.shader_test | 27 +++++++++++++++++++++++++++ tests/pow.shader_test | 26 ++++++++++++++++++++++++++ 6 files changed, 165 insertions(+)
diff --git a/tests/hlsl-clamp.shader_test b/tests/hlsl-clamp.shader_test index 8e26270c..cc198735 100644 --- a/tests/hlsl-clamp.shader_test +++ b/tests/hlsl-clamp.shader_test @@ -8,3 +8,31 @@ float4 main(uniform float3 u) : sv_target uniform 0 float4 -0.3 -0.1 0.7 0.0 draw quad probe all rgba (-0.1, 0.7, -0.3, 0.3) + + +[pixel shader todo] +float4 main() : sv_target +{ + float3x2 a = {6, 5, 4, 3, 2, 1}; + float2x3 b = {1, 2, 3, 4.2, 5.2, 6.2}; + float3x4 c = 5.5; + + float2x2 r = clamp(a, b, c); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (5.5, 5.0, 4.2, 5.2) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {3.1, 3.1, 3.1, 3.1}; + float2x2 b = {1, 2, 3, 4}; + float4 c = {5.5, 4.5, 3.5, 2.5}; + + clamp(a, b, c); + return 0; +} diff --git a/tests/hlsl-ldexp.shader_test b/tests/hlsl-ldexp.shader_test index 0873fc9e..bea97953 100644 --- a/tests/hlsl-ldexp.shader_test +++ b/tests/hlsl-ldexp.shader_test @@ -30,3 +30,29 @@ uniform 0 int4 2 3 4 5 uniform 4 int4 0 -10 10 100 draw quad probe all rgba (2.0, 0.00292968750, 4096.0, 6.33825300e+030) + + +[pixel shader todo] +float4 main() : sv_target +{ + float2x3 a = {1, 2, 3, 4, 5, 6}; + float3x2 b = {6, 5, 4, 3, 2, 1}; + + float2x2 r = ldexp(a, b); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (64.0, 64.0, 64.0, 40.0) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {1, 2, 3, 4}; + float1 b = {2}; + + ldexp(a, b); + return 0; +} diff --git a/tests/hlsl-lerp.shader_test b/tests/hlsl-lerp.shader_test index 3f93b02d..3cd10ec1 100644 --- a/tests/hlsl-lerp.shader_test +++ b/tests/hlsl-lerp.shader_test @@ -34,3 +34,31 @@ uniform 4 int4 0 -10 10 1000000 uniform 8 int4 0 1 -1 1000000 draw quad probe all rgba (2.0, -10.0, -2.0, 1e12) + + +[pixel shader todo] +float4 main() : sv_target +{ + float3x2 a = {6, 5, 4, 3, 2, 1}; + float2x3 b = {1, 2, 3, 4.2, 5.2, 6.2}; + float3x4 c = 2.4; + + float2x2 r = lerp(a, b, c); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (-6.0, -2.2, 4.48, 8.28) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {0, 1, 2, 3}; + float2x2 b = {1, 2, 3, 4}; + float4 c = {0.5, 0.5, 0.5, 0.5}; + + lerp(a, b, c); + return 0; +} diff --git a/tests/hlsl-mul.shader_test b/tests/hlsl-mul.shader_test index 7b453187..cb104a9e 100644 --- a/tests/hlsl-mul.shader_test +++ b/tests/hlsl-mul.shader_test @@ -288,3 +288,33 @@ float4 main(float4 pos : sv_position) : sv_target [test] draw quad probe all rgba (78.0, 96.0, 114.0, 0.0) + + +[pixel shader] +float4 main() : sv_target +{ + float2x3 a = float2x3(1, 2, 3, 4, 5, 6); + float3x2 b = float3x2(6, 5, 4, 3, 2, 1); + + float2x2 r = mul(a, b); + return float4(r); +} + +[test] +draw quad +probe all rgba (20.0, 14.0, 56.0, 41.0) + + +[pixel shader] +float4 main() : sv_target +{ + float2x2 a = float2x2(1, 2, 3, 4); + float2 b = float2(1, 2); + + float2 r = mul(a, b); + return float4(r, 0, 0); +} + +[test] +draw quad +probe all rgba (5.0, 11.0, 0.0, 0.0) diff --git a/tests/max.shader_test b/tests/max.shader_test index 50083f33..7a917ec5 100644 --- a/tests/max.shader_test +++ b/tests/max.shader_test @@ -9,6 +9,7 @@ uniform 0 float4 0.7 -0.1 0.0 0.0 draw quad probe all rgba (0.7, 2.1, 2.0, -1.0)
+ [pixel shader] float4 main(uniform float4 u) : sv_target { @@ -21,3 +22,29 @@ float4 main(uniform float4 u) : sv_target uniform 0 float4 0.7 -0.1 0.4 0.8 draw quad probe all rgba (0.7, 0.8, 0.7, 0.2) + + +[pixel shader todo] +float4 main() : sv_target +{ + float2x3 a = {1, 2, 3, 4, 5, 6}; + float3x2 b = {6, 5, 4, 3, 2, 1}; + + float2x2 r = max(a, b); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (6.0, 5.0, 4.0, 5.0) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {1, 2, 3, 4}; + float4 b = {4, 3, 2, 1}; + + max(a, b); + return 0; +} diff --git a/tests/pow.shader_test b/tests/pow.shader_test index 6470494e..6f2b2741 100644 --- a/tests/pow.shader_test +++ b/tests/pow.shader_test @@ -8,3 +8,29 @@ float4 main(uniform float4 u) : sv_target uniform 0 float4 0.4 0.8 2.5 2.0 draw quad probe all rgba (0.512, 0.101192884, 0.64, 0.25) 4 + + +[pixel shader todo] +float4 main() : sv_target +{ + float2x3 a = {1, 2, 3, 4, 5, 6}; + float3x2 b = {6, 5, 4, 3, 2, 1}; + + float2x2 r = pow(a, b); + return float4(r); +} + +[test] +todo draw quad +todo probe all rgba (1.0, 32.0, 256.0, 125.0) + + +[pixel shader fail todo] +float4 main() : sv_target +{ + float2x2 a = {1, 2, 3, 4}; + float4 b = {1, 2, 3, 4}; + + pow(a, b); + return 0; +}
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl.y | 80 +++++++++++++++++++++++++++++++ tests/hlsl-clamp.shader_test | 8 ++-- tests/hlsl-ldexp.shader_test | 8 ++-- tests/hlsl-lerp.shader_test | 8 ++-- tests/hlsl-smoothstep.shader_test | 14 +++--- tests/max.shader_test | 8 ++-- tests/pow.shader_test | 2 +- 7 files changed, 104 insertions(+), 24 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 4a551724..910261ba 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2225,6 +2225,65 @@ static struct hlsl_ir_node *intrinsic_float_convert_arg(struct hlsl_ctx *ctx, return add_implicit_conversion(ctx, params->instrs, arg, type, loc); }
+static bool elementwise_intrinsic_convert_args_to_common_type(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + enum hlsl_base_type base = params->args[0]->data_type->base_type; + bool vectors = false, matrices = false; + unsigned int dimx = 4, dimy = 4; + struct hlsl_type *common_type; + unsigned int i; + + for (i = 0; i < params->args_count; ++i) + { + struct hlsl_type *arg_type = params->args[i]->data_type; + + base = expr_common_base_type(base, arg_type->base_type); + + if (arg_type->type == HLSL_CLASS_VECTOR) + { + vectors = true; + dimx = min(dimx, arg_type->dimx); + } + else if (arg_type->type == HLSL_CLASS_MATRIX) + { + matrices = true; + dimx = min(dimx, arg_type->dimx); + dimy = min(dimy, arg_type->dimy); + } + } + + if (matrices && vectors) + { + hlsl_error(ctx, loc, VKD3D_SHADER_ERROR_HLSL_INVALID_TYPE, + "Cannot use both matrices and vectors in an elementwise intrinsic."); + return false; + } + else if (matrices) + { + common_type = hlsl_get_matrix_type(ctx, base, dimx, dimy); + } + else if (vectors) + { + common_type = hlsl_get_vector_type(ctx, base, dimx); + } + else + { + common_type = hlsl_get_scalar_type(ctx, base); + } + + for (i = 0; i < params->args_count; ++i) + { + struct hlsl_ir_node *new_arg; + + if (!(new_arg = add_implicit_conversion(ctx, params->instrs, params->args[i], common_type, loc))) + return NULL; + params->args[i] = new_arg; + } + + return true; +} + static bool intrinsic_abs(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { @@ -2280,6 +2339,9 @@ static bool intrinsic_clamp(struct hlsl_ctx *ctx, { struct hlsl_ir_node *max;
+ if (!elementwise_intrinsic_convert_args_to_common_type(ctx, params, loc)) + return false; + if (!(max = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MAX, params->args[0], params->args[1], loc))) return false;
@@ -2361,6 +2423,9 @@ static bool intrinsic_ldexp(struct hlsl_ctx *ctx, { struct hlsl_ir_node *arg;
+ if (!elementwise_intrinsic_convert_args_to_common_type(ctx, params, loc)) + return false; + if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[1], loc))) return false;
@@ -2400,6 +2465,9 @@ static bool intrinsic_lerp(struct hlsl_ctx *ctx, { struct hlsl_ir_node *arg, *neg, *add, *mul;
+ if (!elementwise_intrinsic_convert_args_to_common_type(ctx, params, loc)) + return false; + if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[0], loc))) return false;
@@ -2418,12 +2486,18 @@ static bool intrinsic_lerp(struct hlsl_ctx *ctx, static bool intrinsic_max(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { + if (!elementwise_intrinsic_convert_args_to_common_type(ctx, params, loc)) + return false; + return !!add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MAX, params->args[0], params->args[1], loc); }
static bool intrinsic_min(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { + if (!elementwise_intrinsic_convert_args_to_common_type(ctx, params, loc)) + return false; + return !!add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MIN, params->args[0], params->args[1], loc); }
@@ -2558,6 +2632,9 @@ static bool intrinsic_pow(struct hlsl_ctx *ctx, { struct hlsl_ir_node *log, *exp, *arg, *mul;
+ if (!elementwise_intrinsic_convert_args_to_common_type(ctx, params, loc)) + return false; + if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[0], loc))) return false;
@@ -2606,6 +2683,9 @@ static bool intrinsic_smoothstep(struct hlsl_ctx *ctx, struct hlsl_type *common_type; unsigned int dimx, dimy;
+ if (!elementwise_intrinsic_convert_args_to_common_type(ctx, params, loc)) + return false; + min_arg = params->args[0]; max_arg = params->args[1]; x_arg = params->args[2]; diff --git a/tests/hlsl-clamp.shader_test b/tests/hlsl-clamp.shader_test index cc198735..1320c3dd 100644 --- a/tests/hlsl-clamp.shader_test +++ b/tests/hlsl-clamp.shader_test @@ -10,7 +10,7 @@ draw quad probe all rgba (-0.1, 0.7, -0.3, 0.3)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float3x2 a = {6, 5, 4, 3, 2, 1}; @@ -22,11 +22,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (5.5, 5.0, 4.2, 5.2) +draw quad +probe all rgba (5.5, 5.0, 4.2, 5.2)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {3.1, 3.1, 3.1, 3.1}; diff --git a/tests/hlsl-ldexp.shader_test b/tests/hlsl-ldexp.shader_test index bea97953..92988d37 100644 --- a/tests/hlsl-ldexp.shader_test +++ b/tests/hlsl-ldexp.shader_test @@ -32,7 +32,7 @@ draw quad probe all rgba (2.0, 0.00292968750, 4096.0, 6.33825300e+030)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float2x3 a = {1, 2, 3, 4, 5, 6}; @@ -43,11 +43,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (64.0, 64.0, 64.0, 40.0) +draw quad +probe all rgba (64.0, 64.0, 64.0, 40.0)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {1, 2, 3, 4}; diff --git a/tests/hlsl-lerp.shader_test b/tests/hlsl-lerp.shader_test index 3cd10ec1..15e90cef 100644 --- a/tests/hlsl-lerp.shader_test +++ b/tests/hlsl-lerp.shader_test @@ -36,7 +36,7 @@ draw quad probe all rgba (2.0, -10.0, -2.0, 1e12)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float3x2 a = {6, 5, 4, 3, 2, 1}; @@ -48,11 +48,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (-6.0, -2.2, 4.48, 8.28) +draw quad +probe all rgba (-6.0, -2.2, 4.48, 8.28) 1
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {0, 1, 2, 3}; diff --git a/tests/hlsl-smoothstep.shader_test b/tests/hlsl-smoothstep.shader_test index ad53b50a..6dadfa8e 100644 --- a/tests/hlsl-smoothstep.shader_test +++ b/tests/hlsl-smoothstep.shader_test @@ -93,7 +93,7 @@ draw quad probe all rgba (1.0, 1.0, 0, 0) 1
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float2x3 a = {1, 1, 1, 1, 1, 1}; @@ -105,8 +105,8 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (0.028, 0.104, 0.216, 0.352) 1 +draw quad +probe all rgba (0.028, 0.104, 0.216, 0.352) 6
[pixel shader] @@ -141,7 +141,7 @@ draw quad probe all rgba (0.5, 0.5, 0.5, 0.0)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float4x1 a = {0.0, 0.0, 0.0, 0.0}; @@ -153,11 +153,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (0.5, 0.5, 0.5, 0.5) +draw quad +probe all rgba (0.5, 0.5, 0.5, 0.5)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {0.0, 0.0, 0.0, 0.0}; diff --git a/tests/max.shader_test b/tests/max.shader_test index 7a917ec5..3a5c3125 100644 --- a/tests/max.shader_test +++ b/tests/max.shader_test @@ -24,7 +24,7 @@ draw quad probe all rgba (0.7, 0.8, 0.7, 0.2)
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float2x3 a = {1, 2, 3, 4, 5, 6}; @@ -35,11 +35,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (6.0, 5.0, 4.0, 5.0) +draw quad +probe all rgba (6.0, 5.0, 4.0, 5.0)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {1, 2, 3, 4}; diff --git a/tests/pow.shader_test b/tests/pow.shader_test index 6f2b2741..0c1d7de3 100644 --- a/tests/pow.shader_test +++ b/tests/pow.shader_test @@ -25,7 +25,7 @@ todo draw quad todo probe all rgba (1.0, 32.0, 256.0, 125.0)
-[pixel shader fail todo] +[pixel shader fail] float4 main() : sv_target { float2x2 a = {1, 2, 3, 4};
From: Francisco Casas fcasas@codeweavers.com
Using add_unary_arithmetic_expr() instead of hlsl_new_unary_expr() allows the intrinsic to work with matrices.
Otherwise we get:
E5017: Aborting due to not yet implemented feature: Copying from unsupported node type.
because an HLSL_IR_EXPR reaches split_matrix_copies(). --- libs/vkd3d-shader/hlsl.y | 7 +++---- tests/pow.shader_test | 6 +++--- 2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 910261ba..a17ee643 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2638,16 +2638,15 @@ static bool intrinsic_pow(struct hlsl_ctx *ctx, if (!(arg = intrinsic_float_convert_arg(ctx, params, params->args[0], loc))) return false;
- if (!(log = hlsl_new_unary_expr(ctx, HLSL_OP1_LOG2, arg, *loc))) + if (!(log = add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_LOG2, arg, loc))) return false; - list_add_tail(params->instrs, &log->entry);
if (!(mul = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, params->args[1], log, loc))) return false;
- if (!(exp = hlsl_new_unary_expr(ctx, HLSL_OP1_EXP2, mul, *loc))) + if (!(exp = add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_EXP2, mul, loc))) return false; - list_add_tail(params->instrs, &exp->entry); + return true; }
diff --git a/tests/pow.shader_test b/tests/pow.shader_test index 0c1d7de3..1bb3bd94 100644 --- a/tests/pow.shader_test +++ b/tests/pow.shader_test @@ -10,7 +10,7 @@ draw quad probe all rgba (0.512, 0.101192884, 0.64, 0.25) 4
-[pixel shader todo] +[pixel shader] float4 main() : sv_target { float2x3 a = {1, 2, 3, 4, 5, 6}; @@ -21,8 +21,8 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (1.0, 32.0, 256.0, 125.0) +draw quad +probe all rgba (1.0, 32.0, 256.0, 125.0) 2
[pixel shader fail]
Zebediah Figura (@zfigura) commented about libs/vkd3d-shader/hlsl.y:
return !!add_unary_arithmetic_expr(ctx, params->instrs, HLSL_OP1_SAT, arg, loc);
}
+/* smoothstep(a, b, x) = saturate(p^2 (3 - 2p)), where p = (x - a)/(b - a) */ +static bool intrinsic_smoothstep(struct hlsl_ctx *ctx,
const struct parse_initializer *params, const struct vkd3d_shader_location *loc)
What you have here is instead "p^2 (3 - 2p), where p = saturate((x - a)/(b - a))". I haven't checked the math, but the discrepancy should probably be fixed.
Zebediah Figura (@zfigura) commented about libs/vkd3d-shader/hlsl.y:
- if (!(p_num = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_ADD, x_arg, min_arg, loc)))
return false;
- if (!(p_denom = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_ADD, max_arg, min_arg, loc)))
return false;
- if (!(one = hlsl_new_float_constant(ctx, 1.0, loc)))
return false;
- list_add_tail(params->instrs, &one->node.entry);
- if (!(p_denom = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_DIV, &one->node, p_denom, loc)))
return false;
- if (!(p = add_binary_arithmetic_expr(ctx, params->instrs, HLSL_OP2_MUL, p_num, p_denom, loc)))
return false;
This seems redundant, shouldn't this just be a single div?
Zebediah Figura (@zfigura) commented about Makefile.am:
tests/hlsl-state-block-syntax.shader_test \ tests/hlsl-static-initializer.shader_test \ tests/hlsl-storage-qualifiers.shader_test \
- tests/hlsl-smoothstep.shader_test \
This is out of alphabetical order.
Zebediah Figura (@zfigura) commented about libs/vkd3d-shader/hlsl.y:
return add_implicit_conversion(ctx, params->instrs, arg, type, loc);
}
+static bool elementwise_intrinsic_convert_args_to_common_type(struct hlsl_ctx *ctx,
const struct parse_initializer *params, const struct vkd3d_shader_location *loc)
This name could maybe be shortened, just "elementwise_intrinsic_get_type"?
On Sun Nov 27 04:45:37 2022 +0000, Zebediah Figura wrote:
What you have here is instead "p^2 (3 - 2p), where p = saturate((x - a)/(b - a))". I haven't checked the math, but the discrepancy should probably be fixed.
Thanks for catching that. The comment is wrong.
On Sun Nov 27 04:48:32 2022 +0000, Zebediah Figura wrote:
This is out of alphabetical order.
Thanks! (I wonder why I make this mistake so often...)
On Sun Nov 27 04:47:26 2022 +0000, Zebediah Figura wrote:
This seems redundant, shouldn't this just be a single div?
Mathematically yes, but the native compiler does a `1/(·)` first, and then uses a `mul_sat`. So it made sense to me to leave it as DIV + MUL + SAT, just in case we ever want to merge the MUL and the SAT.
For the record, the following shader: ```hlsl uniform float A, B, X;
float4 main() : sv_target { return smoothstep(A, B, X); } ```
results in the following instructions in the native compiler: ```hlsl // With // A = cb0[0].x // B = cb0[0].y // X = cb0[0].z
// r0.x = B - A // r0.y = X - A add r0.xy, -cb0[0].xxxx, cb0[0].yzyy
// r0.x = 1 / r0.x div r0.x, l(1.000000, 1.000000, 1.000000, 1.000000), r0.x
// r0.x = saturate(r0.x * r0.y) mul_sat r0.x, r0.x, r0.y
// r0.y = 2 * r0.x - 3 mad r0.y, r0.x, l(-2.000000), l(3.000000)
// r0.x = r0.x * r0.x mul r0.x, r0.x, r0.x
// result = r0.x * r0.y mul o0.xyzw, r0.xxxx, r0.yyyy ```
which is what I try to replicate (we don't have `mul_sat` yet).
On Sun Nov 27 04:51:21 2022 +0000, Zebediah Figura wrote:
This name could maybe be shortened, just "elementwise_intrinsic_get_type"?
hmm, that name may be a little misleading because this function doesn't retrieve a type, but instead **converts** the parameters to the common type.
Though I agree that a shorter name would be better.