Indexing with non-constants offsets requires relative addressing in SM4. In assembly, this is written like in the following example:
``` x1[r1.x + 3] ```
The first part of this patch series only includes support for indexing vectors with non-constant indexes.
Following patches in https://gitlab.winehq.org/fcasas/vkd3d/-/commits/nonconst-offsets-3.
---
Non-constant indexing of vectors cannot be implemented with relative addressing in SM4 because this indexation cannot be performed at the level of register-components, only whole registers.
Mathematical operations must be used instead.
For floats, the native compiler seems to index an identity matrix, to get the i-th column, and then proceedes to compute the dot product between that column and the vector. For ints, bit operations seem to be performed.
While probably less efficient, this implementation complies with the type-checking at the IR level and when writing bytecode.
-- v3: vkd3d-shader/hlsl: Support non-constant vector indexing. vkd3d-shader/hlsl: Lower dot for non-float types.
From: Francisco Casas fcasas@codeweavers.com
--- Makefile.am | 1 + tests/array-index-expr.shader_test | 47 ++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+) create mode 100644 tests/array-index-expr.shader_test
diff --git a/Makefile.am b/Makefile.am index e06d0eeb..b5f7159e 100644 --- a/Makefile.am +++ b/Makefile.am @@ -50,6 +50,7 @@ vkd3d_shader_tests = \ tests/arithmetic-int.shader_test \ tests/arithmetic-int-uniform.shader_test \ tests/arithmetic-uint.shader_test \ + tests/array-index-expr.shader_test \ tests/array-parameters.shader_test \ tests/asfloat.shader_test \ tests/asuint.shader_test \ diff --git a/tests/array-index-expr.shader_test b/tests/array-index-expr.shader_test new file mode 100644 index 00000000..231915a3 --- /dev/null +++ b/tests/array-index-expr.shader_test @@ -0,0 +1,47 @@ +[pixel shader todo] +uniform float i; + +float4 main() : SV_TARGET +{ + float4 arr = float4(11.0, 12.0, 13.0, 14.0); + return arr[i]; +} + +[test] +uniform 0 float 0 +todo draw quad +todo probe all rgba (11.0, 11.0, 11.0, 11.0) +uniform 0 float 1 +todo draw quad +todo probe all rgba (12.0, 12.0, 12.0, 12.0) +uniform 0 float 2 +todo draw quad +todo probe all rgba (13.0, 13.0, 13.0, 13.0) +uniform 0 float 3 +todo draw quad +todo probe all rgba (14.0, 14.0, 14.0, 14.0) + + +[pixel shader todo] +uniform float i; + +float4 main() : SV_TARGET +{ + int4 arr_i = int4(21, 22, 23, 24); + bool4 arr_b = bool4(true, false, true, false); + return float4(arr_i[i], arr_b[i], arr_i[3 - i], arr_b[3 - i]); +} + +[test] +uniform 0 float 0 +todo draw quad +todo probe all rgba (21.0, 1.0, 24.0, 0.0) +uniform 0 float 1 +todo draw quad +todo probe all rgba (22.0, 0.0, 23.0, 1.0) +uniform 0 float 2 +todo draw quad +todo probe all rgba (23.0, 1.0, 22.0, 0.0) +uniform 0 float 3 +todo draw quad +todo probe all rgba (24.0, 0.0, 21.0, 1.0)
From: Francisco Casas fcasas@codeweavers.com
--- tests/array-index-expr.shader_test | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/tests/array-index-expr.shader_test b/tests/array-index-expr.shader_test index 231915a3..35a13f7f 100644 --- a/tests/array-index-expr.shader_test +++ b/tests/array-index-expr.shader_test @@ -45,3 +45,28 @@ todo probe all rgba (23.0, 1.0, 22.0, 0.0) uniform 0 float 3 todo draw quad todo probe all rgba (24.0, 0.0, 21.0, 1.0) + + +[pixel shader todo] +uniform float2 i; + +float4 main() : sv_target +{ + float4 f[3] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; + + return f[i.x + i.y]; +} + +[test] +uniform 0 float4 0 0 0 0 +todo draw quad +todo probe all rgba (1.0, 2.0, 3.0, 4.0) +uniform 0 float4 1 0 0 0 +todo draw quad +todo probe all rgba (5.0, 6.0, 7.0, 8.0) +uniform 0 float4 0 1 0 0 +todo draw quad +todo probe all rgba (5.0, 6.0, 7.0, 8.0) +uniform 0 float4 1 1 0 0 +todo draw quad +todo probe all rgba (9.0, 10.0, 11.0, 12.0)
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 77 ++++++++++++++++++-------------- 1 file changed, 44 insertions(+), 33 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 638dab6e..6a99c33e 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -123,15 +123,14 @@ static struct hlsl_ir_node *new_offset_instr_from_deref(struct hlsl_ctx *ctx, st }
/* TODO: remove when no longer needed, only used for transform_deref_paths_into_offsets() */ -static void replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_deref *deref, +static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct hlsl_ir_node *instr) { const struct hlsl_type *type; struct hlsl_ir_node *offset; struct hlsl_block block;
- if (!deref->var) - return; + assert(deref->var);
/* register offsets shouldn't be used before this point is reached. */ assert(!deref->offset.node); @@ -143,45 +142,19 @@ static void replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der if (type->class == HLSL_CLASS_STRUCT || type->class == HLSL_CLASS_ARRAY) { hlsl_cleanup_deref(deref); - return; + return true; }
deref->offset_regset = hlsl_type_get_regset(type);
if (!(offset = new_offset_instr_from_deref(ctx, &block, deref, &instr->loc))) - return; + return false; list_move_before(&instr->entry, &block.instrs);
hlsl_cleanup_deref(deref); hlsl_src_from_node(&deref->offset, offset); -} - -/* TODO: remove when no longer needed. */ -static bool transform_deref_paths_into_offsets(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) -{ - switch(instr->type) - { - case HLSL_IR_LOAD: - replace_deref_path_with_offset(ctx, &hlsl_ir_load(instr)->src, instr); - return true;
- case HLSL_IR_STORE: - replace_deref_path_with_offset(ctx, &hlsl_ir_store(instr)->lhs, instr); - return true; - - case HLSL_IR_RESOURCE_LOAD: - replace_deref_path_with_offset(ctx, &hlsl_ir_resource_load(instr)->resource, instr); - replace_deref_path_with_offset(ctx, &hlsl_ir_resource_load(instr)->sampler, instr); - return true; - - case HLSL_IR_RESOURCE_STORE: - replace_deref_path_with_offset(ctx, &hlsl_ir_resource_store(instr)->resource, instr); - return true; - - default: - return false; - } - return false; + return true; }
/* Split uniforms into two variables representing the constant and temp @@ -600,6 +573,44 @@ bool hlsl_transform_ir(struct hlsl_ctx *ctx, bool (*func)(struct hlsl_ctx *ctx, return progress; }
+static bool transform_instr_derefs(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) +{ + bool res; + bool (*func)(struct hlsl_ctx *ctx, struct hlsl_deref *, struct hlsl_ir_node *) = context; + + switch(instr->type) + { + case HLSL_IR_LOAD: + res = func(ctx, &hlsl_ir_load(instr)->src, instr); + return res; + + case HLSL_IR_STORE: + res = func(ctx, &hlsl_ir_store(instr)->lhs, instr); + return res; + + case HLSL_IR_RESOURCE_LOAD: + res = func(ctx, &hlsl_ir_resource_load(instr)->resource, instr); + if (hlsl_ir_resource_load(instr)->sampler.var) + res |= func(ctx, &hlsl_ir_resource_load(instr)->sampler, instr); + return res; + + case HLSL_IR_RESOURCE_STORE: + res = func(ctx, &hlsl_ir_resource_store(instr)->resource, instr); + return res; + + default: + return false; + } + return false; +} + +static bool transform_derefs(struct hlsl_ctx *ctx, + bool (*func)(struct hlsl_ctx *ctx, struct hlsl_deref *, struct hlsl_ir_node *), + struct hlsl_block *block) +{ + return hlsl_transform_ir(ctx, transform_instr_derefs, block, func); +} + struct recursive_call_ctx { const struct hlsl_ir_function_decl **backtrace; @@ -3944,7 +3955,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry hlsl_transform_ir(ctx, track_object_components_usage, body, NULL);
/* TODO: move forward, remove when no longer needed */ - hlsl_transform_ir(ctx, transform_deref_paths_into_offsets, body, NULL); + transform_derefs(ctx, replace_deref_path_with_offset, body); while (hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL));
do
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 51 ++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 6a99c33e..50bb5358 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2389,6 +2389,56 @@ static bool lower_int_abs(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void return true; }
+static bool lower_int_dot(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) +{ + struct hlsl_ir_node *arg1, *arg2, *mult, *comps[4] = {0}, *res; + struct hlsl_type *type = instr->data_type; + struct hlsl_ir_expr *expr; + unsigned int i, dimx; + + if (instr->type != HLSL_IR_EXPR) + return false; + expr = hlsl_ir_expr(instr); + + if (expr->op != HLSL_OP2_DOT) + return false; + + if (type->base_type == HLSL_TYPE_BOOL || type->base_type == HLSL_TYPE_INT + || type->base_type == HLSL_TYPE_UINT) + { + arg1 = expr->operands[0].node; + arg2 = expr->operands[1].node; + assert(arg1->data_type->dimx == arg2->data_type->dimx); + dimx = arg1->data_type->dimx; + + if (!(mult = hlsl_new_binary_expr(ctx, HLSL_OP2_MUL, arg1, arg2))) + return false; + list_add_before(&instr->entry, &mult->entry); + + for (i = 0; i < dimx; ++i) + { + unsigned int s = hlsl_swizzle_from_writemask(1 << i); + + if (!(comps[i] = hlsl_new_swizzle(ctx, s, 1, mult, &instr->loc))) + return false; + list_add_before(&instr->entry, &comps[i]->entry); + } + + res = comps[0]; + for (i = 1; i < dimx; ++i) + { + if (!(res = hlsl_new_binary_expr(ctx, HLSL_OP2_ADD, res, comps[i]))) + return false; + list_add_before(&instr->entry, &res->entry); + } + + hlsl_replace_node(instr, res); + return true; + } + + return false; +} + static bool lower_float_modulus(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) { struct hlsl_ir_node *arg1, *arg2, *mul1, *neg1, *ge, *neg2, *div, *mul2, *frc, *cond; @@ -3924,6 +3974,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry
hlsl_transform_ir(ctx, lower_narrowing_casts, body, NULL); hlsl_transform_ir(ctx, lower_casts_to_bool, body, NULL); + hlsl_transform_ir(ctx, lower_int_dot, body, NULL); hlsl_transform_ir(ctx, lower_int_division, body, NULL); hlsl_transform_ir(ctx, lower_int_modulus, body, NULL); hlsl_transform_ir(ctx, lower_int_abs, body, NULL);
From: Francisco Casas fcasas@codeweavers.com
Non-constant vector indexing is not solved with relative addressing in the register indexes because this indexation cannot be at the level of register-components.
Mathematical operations must be used instead. --- libs/vkd3d-shader/hlsl.c | 13 +++ libs/vkd3d-shader/hlsl.h | 2 + libs/vkd3d-shader/hlsl_codegen.c | 85 ++++++++++++++++++- tests/array-index-expr.shader_test | 36 ++++---- tests/expr-indexing.shader_test | 22 ++--- tests/hlsl-matrix-indexing.shader_test | 6 +- .../hlsl-vector-indexing-uniform.shader_test | 4 +- 7 files changed, 132 insertions(+), 36 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index 617aef30..6da7d1ca 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -1287,6 +1287,19 @@ struct hlsl_ir_load *hlsl_new_load_index(struct hlsl_ctx *ctx, const struct hlsl return load; }
+struct hlsl_ir_load *hlsl_new_load_parent(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, + const struct vkd3d_shader_location *loc) +{ + /* This deref can only exists temporarily because it is not the real owner of its members. */ + struct hlsl_deref tmp_deref; + + assert(deref->path_len >= 1); + + tmp_deref = *deref; + tmp_deref.path_len = deref->path_len - 1; + return hlsl_new_load_index(ctx, &tmp_deref, NULL, loc); +} + struct hlsl_ir_load *hlsl_new_var_load(struct hlsl_ctx *ctx, struct hlsl_ir_var *var, const struct vkd3d_shader_location *loc) { diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 6a4e314d..8eeecc13 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -1119,6 +1119,8 @@ struct hlsl_ir_load *hlsl_new_var_load(struct hlsl_ctx *ctx, struct hlsl_ir_var const struct vkd3d_shader_location *loc); struct hlsl_ir_load *hlsl_new_load_index(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, struct hlsl_ir_node *idx, const struct vkd3d_shader_location *loc); +struct hlsl_ir_load *hlsl_new_load_parent(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, + const struct vkd3d_shader_location *loc); struct hlsl_ir_node *hlsl_new_load_component(struct hlsl_ctx *ctx, struct hlsl_block *block, const struct hlsl_deref *deref, unsigned int comp, const struct vkd3d_shader_location *loc);
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 50bb5358..9a2cb80b 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1992,6 +1992,81 @@ static bool remove_trivial_swizzles(struct hlsl_ctx *ctx, struct hlsl_ir_node *i return true; }
+static bool lower_nonconstant_vector_derefs(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) +{ + struct hlsl_ir_node *idx; + struct hlsl_deref *deref; + struct hlsl_type *type; + unsigned int i; + + if (instr->type != HLSL_IR_LOAD) + return false; + + deref = &hlsl_ir_load(instr)->src; + assert(deref->var); + + if (deref->path_len == 0) + return false; + + type = deref->var->data_type; + for (i = 0; i < deref->path_len - 1; ++i) + type = hlsl_get_element_type_from_path_index(ctx, type, deref->path[i].node); + + idx = deref->path[deref->path_len - 1].node; + + if (type->class == HLSL_CLASS_VECTOR && idx->type != HLSL_IR_CONSTANT) + { + struct hlsl_ir_node *eq, *swizzle, *dot, *operands[HLSL_MAX_OPERANDS] = {0}; + struct hlsl_ir_load *vector_load; + struct hlsl_ir_constant *c; + enum hlsl_ir_expr_op op; + + if (!(vector_load = hlsl_new_load_parent(ctx, deref, &instr->loc))) + return false; + list_add_before(&instr->entry, &vector_load->node.entry); + + if (!(swizzle = hlsl_new_swizzle(ctx, HLSL_SWIZZLE(X, X, X, X), type->dimx, idx, &instr->loc))) + return false; + list_add_before(&instr->entry, &swizzle->entry); + + if (!(c = hlsl_new_constant(ctx, hlsl_get_vector_type(ctx, HLSL_TYPE_UINT, type->dimx), &instr->loc))) + return false; + c->value.u[0].u = 0; + c->value.u[1].u = 1; + c->value.u[2].u = 2; + c->value.u[3].u = 3; + list_add_before(&instr->entry, &c->node.entry); + + operands[0] = swizzle; + operands[1] = &c->node; + if (!(eq = hlsl_new_expr(ctx, HLSL_OP2_EQUAL, operands, + hlsl_get_vector_type(ctx, HLSL_TYPE_BOOL, type->dimx), &instr->loc))) + return false; + list_add_before(&instr->entry, &eq->entry); + + if (!(eq = hlsl_new_cast(ctx, eq, type, &instr->loc))) + return false; + list_add_before(&instr->entry, &eq->entry); + + op = HLSL_OP2_DOT; + if (type->dimx == 1) + op = type->base_type == HLSL_TYPE_BOOL ? HLSL_OP2_LOGIC_AND : HLSL_OP2_MUL; + + /* Note: We may be creating a DOT for bool vectors here, which we need to lower to + * LOGIC_OR + LOGIC_AND. */ + operands[0] = &vector_load->node; + operands[1] = eq; + if (!(dot = hlsl_new_expr(ctx, op, operands, instr->data_type, &instr->loc))) + return false; + list_add_before(&instr->entry, &dot->entry); + hlsl_replace_node(instr, dot); + + return true; + } + + return false; +} + /* Lower DIV to RCP + MUL. */ static bool lower_division(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) { @@ -2395,6 +2470,7 @@ static bool lower_int_dot(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void struct hlsl_type *type = instr->data_type; struct hlsl_ir_expr *expr; unsigned int i, dimx; + bool is_bool;
if (instr->type != HLSL_IR_EXPR) return false; @@ -2410,8 +2486,9 @@ static bool lower_int_dot(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void arg2 = expr->operands[1].node; assert(arg1->data_type->dimx == arg2->data_type->dimx); dimx = arg1->data_type->dimx; + is_bool = type->base_type == HLSL_TYPE_BOOL;
- if (!(mult = hlsl_new_binary_expr(ctx, HLSL_OP2_MUL, arg1, arg2))) + if (!(mult = hlsl_new_binary_expr(ctx, is_bool ? HLSL_OP2_LOGIC_AND : HLSL_OP2_MUL, arg1, arg2))) return false; list_add_before(&instr->entry, &mult->entry);
@@ -2427,7 +2504,7 @@ static bool lower_int_dot(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void res = comps[0]; for (i = 1; i < dimx; ++i) { - if (!(res = hlsl_new_binary_expr(ctx, HLSL_OP2_ADD, res, comps[i]))) + if (!(res = hlsl_new_binary_expr(ctx, is_bool ? HLSL_OP2_LOGIC_OR : HLSL_OP2_ADD, res, comps[i]))) return false; list_add_before(&instr->entry, &res->entry); } @@ -3989,6 +4066,10 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry } while (progress);
+ hlsl_transform_ir(ctx, lower_nonconstant_vector_derefs, body, NULL); + hlsl_transform_ir(ctx, lower_casts_to_bool, body, NULL); + hlsl_transform_ir(ctx, lower_int_dot, body, NULL); + if (profile->major_version < 4) { hlsl_transform_ir(ctx, lower_division, body, NULL); diff --git a/tests/array-index-expr.shader_test b/tests/array-index-expr.shader_test index 35a13f7f..e058d984 100644 --- a/tests/array-index-expr.shader_test +++ b/tests/array-index-expr.shader_test @@ -1,4 +1,4 @@ -[pixel shader todo] +[pixel shader] uniform float i;
float4 main() : SV_TARGET @@ -9,20 +9,20 @@ float4 main() : SV_TARGET
[test] uniform 0 float 0 -todo draw quad -todo probe all rgba (11.0, 11.0, 11.0, 11.0) +draw quad +probe all rgba (11.0, 11.0, 11.0, 11.0) uniform 0 float 1 -todo draw quad -todo probe all rgba (12.0, 12.0, 12.0, 12.0) +draw quad +probe all rgba (12.0, 12.0, 12.0, 12.0) uniform 0 float 2 -todo draw quad -todo probe all rgba (13.0, 13.0, 13.0, 13.0) +draw quad +probe all rgba (13.0, 13.0, 13.0, 13.0) uniform 0 float 3 -todo draw quad -todo probe all rgba (14.0, 14.0, 14.0, 14.0) +draw quad +probe all rgba (14.0, 14.0, 14.0, 14.0)
-[pixel shader todo] +[pixel shader] uniform float i;
float4 main() : SV_TARGET @@ -34,17 +34,17 @@ float4 main() : SV_TARGET
[test] uniform 0 float 0 -todo draw quad -todo probe all rgba (21.0, 1.0, 24.0, 0.0) +draw quad +probe all rgba (21.0, 1.0, 24.0, 0.0) uniform 0 float 1 -todo draw quad -todo probe all rgba (22.0, 0.0, 23.0, 1.0) +draw quad +probe all rgba (22.0, 0.0, 23.0, 1.0) uniform 0 float 2 -todo draw quad -todo probe all rgba (23.0, 1.0, 22.0, 0.0) +draw quad +probe all rgba (23.0, 1.0, 22.0, 0.0) uniform 0 float 3 -todo draw quad -todo probe all rgba (24.0, 0.0, 21.0, 1.0) +draw quad +probe all rgba (24.0, 0.0, 21.0, 1.0)
[pixel shader todo] diff --git a/tests/expr-indexing.shader_test b/tests/expr-indexing.shader_test index 83a63d67..3dcc5727 100644 --- a/tests/expr-indexing.shader_test +++ b/tests/expr-indexing.shader_test @@ -13,7 +13,7 @@ draw quad probe all rgba (8.0, 8.0, 8.0, 8.0)
-[pixel shader todo] +[pixel shader] float4 a, b; float i;
@@ -26,8 +26,8 @@ float4 main() : sv_target uniform 0 float4 1.0 2.0 3.0 4.0 uniform 4 float4 5.0 6.0 7.0 8.0 uniform 8 float 2 -todo draw quad -todo probe all rgba (10.0, 10.0, 10.0, 10.0) +draw quad +probe all rgba (10.0, 10.0, 10.0, 10.0)
[pixel shader] @@ -44,7 +44,7 @@ draw quad probe all rgba (3.0, 3.0, 3.0, 3.0)
-[pixel shader todo] +[pixel shader] float4 a; float i;
@@ -56,11 +56,11 @@ float4 main() : sv_target [test] uniform 0 float4 1.0 2.0 3.0 4.0 uniform 4 float 0 -todo draw quad -todo probe all rgba (4.0, 4.0, 4.0, 4.0) +draw quad +probe all rgba (4.0, 4.0, 4.0, 4.0) uniform 4 float 2 -todo draw quad -todo probe all rgba (1.0, 1.0, 1.0, 1.0) +draw quad +probe all rgba (1.0, 1.0, 1.0, 1.0)
[pixel shader] @@ -82,7 +82,7 @@ draw quad probe all rgba (4.0, 4.0, 4.0, 4.0)
-[pixel shader todo] +[pixel shader] float4 a; float i;
@@ -99,5 +99,5 @@ float4 main() : sv_target [test] uniform 0 float4 1.0 2.0 3.0 4.0 uniform 4 float 1 -todo draw quad -todo probe all rgba (2.0, 2.0, 2.0, 2.0) +draw quad +probe all rgba (2.0, 2.0, 2.0, 2.0) diff --git a/tests/hlsl-matrix-indexing.shader_test b/tests/hlsl-matrix-indexing.shader_test index c3d08296..a57d8fb8 100644 --- a/tests/hlsl-matrix-indexing.shader_test +++ b/tests/hlsl-matrix-indexing.shader_test @@ -108,7 +108,7 @@ draw quad probe all rgba (3.0, 4.0, 50.0, 60.0)
-[pixel shader todo] +[pixel shader] uniform float i;
float4 main() : sv_target @@ -120,8 +120,8 @@ float4 main() : sv_target
[test] uniform 0 float 2 -todo draw quad -todo probe all rgba (8, 9, 10, 11) +draw quad +probe all rgba (8, 9, 10, 11)
[pixel shader todo] diff --git a/tests/hlsl-vector-indexing-uniform.shader_test b/tests/hlsl-vector-indexing-uniform.shader_test index 968f570b..e5ffbdd0 100644 --- a/tests/hlsl-vector-indexing-uniform.shader_test +++ b/tests/hlsl-vector-indexing-uniform.shader_test @@ -1,6 +1,6 @@ % Use a uniform to prevent the compiler from optimizing.
-[pixel shader todo] +[pixel shader] uniform float i; float4 main() : SV_TARGET { @@ -12,5 +12,5 @@ float4 main() : SV_TARGET
[test] uniform 0 float 2 -todo draw quad +draw quad probe all rgba (0.5, 0.3, 0.8, 0.2)
the only place in the whole codebase where HLSL_OP2_DOT for bools can exist.
Though notice that it's not by design. If we find something else that sensibly generates a `DOT` of `bool`s, then we can use it there too.
At any rate, I fully understand that my way of reasoning might sound a bit peculiar here, and definitely the SLOC difference isn't that large, so no hard feelings if you and Zeb decide in the end that I can keep my algebraic mutterings somewhere else.
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl_codegen.c:
list_add_before(&instr->entry, &c->node.entry);
operands[0] = swizzle;
operands[1] = &c->node;
if (!(eq = hlsl_new_expr(ctx, HLSL_OP2_EQUAL, operands,
hlsl_get_vector_type(ctx, HLSL_TYPE_BOOL, type->dimx), &instr->loc)))
return false;
list_add_before(&instr->entry, &eq->entry);
if (!(eq = hlsl_new_cast(ctx, eq, type, &instr->loc)))
return false;
list_add_before(&instr->entry, &eq->entry);
op = HLSL_OP2_DOT;
if (type->dimx == 1)
op = type->base_type == HLSL_TYPE_BOOL ? HLSL_OP2_LOGIC_AND : HLSL_OP2_MUL;
TBH I wouldn't even dislike deciding that `DOT` is defined for scalars, with the obvious semantics. Which might even be my fault if it's not. But no need to further overload this MR with that.
I just remembered that we probably want to run this pass after folding constant expressions, to avoid adding instructions that we can't fold, e.g. just because of an unfolded cast in the index. This, unless we make the passes smart enough so they can fold `x + 0 = x` and `x * 0 = 0`.
I came to thinking the same while reviewing. Even more, I might be not computationally optimal, but in the end the compiler structure might be simpler if we just lower *all* vector indexing operations to a dot product and them let constant folding and other passes recover the indexing operations that can be made constant.
Just saying, though, I'd first have this MR accepted and then this can be revisited later.
This merge request was approved by Giovanni Mascellani.
Zebediah Figura (@zfigura) commented about libs/vkd3d-shader/hlsl_codegen.c:
arg2 = expr->operands[1].node; assert(arg1->data_type->dimx == arg2->data_type->dimx); dimx = arg1->data_type->dimx;
is_bool = type->base_type == HLSL_TYPE_BOOL;
if (!(mult = hlsl_new_binary_expr(ctx, HLSL_OP2_MUL, arg1, arg2)))
if (!(mult = hlsl_new_binary_expr(ctx, is_bool ? HLSL_OP2_LOGIC_AND : HLSL_OP2_MUL, arg1, arg2)))
This seems like it should be part of whatever patch introduces lower_int_dot().
On Fri May 26 18:38:22 2023 +0000, Zebediah Figura wrote:
This seems like it should be part of whatever patch introduces lower_int_dot().
I put these changes in the "Support non-constant vector indexing." patch since before that DOT operations for bool vectors cannot be formed in any way.
On Fri May 26 19:55:33 2023 +0000, Francisco Casas wrote:
I put these changes in the "Support non-constant vector indexing." patch since before that DOT operations for bool vectors cannot be formed in any way.
In that case lower_int_dot() shouldn't pretend to handle BOOL when it's introduced.
On Fri May 26 20:47:19 2023 +0000, Zebediah Figura wrote:
In that case lower_int_dot() shouldn't pretend to handle BOOL when it's introduced.
Ah, I see now that I forgot to also postpone adding the `type->base_type == HLSL_TYPE_BOOL` check in the conditional.