Indexing with non-constants offsets requires relative addressing in SM4. In assembly, this is written like in the following example:
``` x1[r1.x + 3] ```
The first part of this patch series only includes support for indexing vectors with non-constant indexes.
Following patches in https://gitlab.winehq.org/fcasas/vkd3d/-/commits/nonconst-offsets-3.
---
Non-constant indexing of vectors cannot be implemented with relative addressing in SM4 because this indexation cannot be performed at the level of register-components, only whole registers.
Mathematical operations must be used instead.
For floats, the native compiler seems to index an identity matrix, to get the i-th column, and then proceedes to compute the dot product between that column and the vector. For ints, bit operations seem to be performed.
While probably less efficient, this implementation complies with the type-checking at the IR level and when writing bytecode.
-- v2: vkd3d-shader/hlsl: Support non-constant vector indexing. vkd3d-shader/hlsl: Lower dot for integral types. vkd3d-shader/hlsl: Introduce transform_derefs(). tests: Test indexing temps. tests: Test non-constant vector indexing.
From: Francisco Casas fcasas@codeweavers.com
--- Makefile.am | 1 + tests/array-index-expr.shader_test | 47 ++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+) create mode 100644 tests/array-index-expr.shader_test
diff --git a/Makefile.am b/Makefile.am index e06d0eeb..b5f7159e 100644 --- a/Makefile.am +++ b/Makefile.am @@ -50,6 +50,7 @@ vkd3d_shader_tests = \ tests/arithmetic-int.shader_test \ tests/arithmetic-int-uniform.shader_test \ tests/arithmetic-uint.shader_test \ + tests/array-index-expr.shader_test \ tests/array-parameters.shader_test \ tests/asfloat.shader_test \ tests/asuint.shader_test \ diff --git a/tests/array-index-expr.shader_test b/tests/array-index-expr.shader_test new file mode 100644 index 00000000..231915a3 --- /dev/null +++ b/tests/array-index-expr.shader_test @@ -0,0 +1,47 @@ +[pixel shader todo] +uniform float i; + +float4 main() : SV_TARGET +{ + float4 arr = float4(11.0, 12.0, 13.0, 14.0); + return arr[i]; +} + +[test] +uniform 0 float 0 +todo draw quad +todo probe all rgba (11.0, 11.0, 11.0, 11.0) +uniform 0 float 1 +todo draw quad +todo probe all rgba (12.0, 12.0, 12.0, 12.0) +uniform 0 float 2 +todo draw quad +todo probe all rgba (13.0, 13.0, 13.0, 13.0) +uniform 0 float 3 +todo draw quad +todo probe all rgba (14.0, 14.0, 14.0, 14.0) + + +[pixel shader todo] +uniform float i; + +float4 main() : SV_TARGET +{ + int4 arr_i = int4(21, 22, 23, 24); + bool4 arr_b = bool4(true, false, true, false); + return float4(arr_i[i], arr_b[i], arr_i[3 - i], arr_b[3 - i]); +} + +[test] +uniform 0 float 0 +todo draw quad +todo probe all rgba (21.0, 1.0, 24.0, 0.0) +uniform 0 float 1 +todo draw quad +todo probe all rgba (22.0, 0.0, 23.0, 1.0) +uniform 0 float 2 +todo draw quad +todo probe all rgba (23.0, 1.0, 22.0, 0.0) +uniform 0 float 3 +todo draw quad +todo probe all rgba (24.0, 0.0, 21.0, 1.0)
From: Francisco Casas fcasas@codeweavers.com
--- tests/array-index-expr.shader_test | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/tests/array-index-expr.shader_test b/tests/array-index-expr.shader_test index 231915a3..35a13f7f 100644 --- a/tests/array-index-expr.shader_test +++ b/tests/array-index-expr.shader_test @@ -45,3 +45,28 @@ todo probe all rgba (23.0, 1.0, 22.0, 0.0) uniform 0 float 3 todo draw quad todo probe all rgba (24.0, 0.0, 21.0, 1.0) + + +[pixel shader todo] +uniform float2 i; + +float4 main() : sv_target +{ + float4 f[3] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; + + return f[i.x + i.y]; +} + +[test] +uniform 0 float4 0 0 0 0 +todo draw quad +todo probe all rgba (1.0, 2.0, 3.0, 4.0) +uniform 0 float4 1 0 0 0 +todo draw quad +todo probe all rgba (5.0, 6.0, 7.0, 8.0) +uniform 0 float4 0 1 0 0 +todo draw quad +todo probe all rgba (5.0, 6.0, 7.0, 8.0) +uniform 0 float4 1 1 0 0 +todo draw quad +todo probe all rgba (9.0, 10.0, 11.0, 12.0)
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 77 ++++++++++++++++++-------------- 1 file changed, 44 insertions(+), 33 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 638dab6e..6a99c33e 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -123,15 +123,14 @@ static struct hlsl_ir_node *new_offset_instr_from_deref(struct hlsl_ctx *ctx, st }
/* TODO: remove when no longer needed, only used for transform_deref_paths_into_offsets() */ -static void replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_deref *deref, +static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct hlsl_ir_node *instr) { const struct hlsl_type *type; struct hlsl_ir_node *offset; struct hlsl_block block;
- if (!deref->var) - return; + assert(deref->var);
/* register offsets shouldn't be used before this point is reached. */ assert(!deref->offset.node); @@ -143,45 +142,19 @@ static void replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der if (type->class == HLSL_CLASS_STRUCT || type->class == HLSL_CLASS_ARRAY) { hlsl_cleanup_deref(deref); - return; + return true; }
deref->offset_regset = hlsl_type_get_regset(type);
if (!(offset = new_offset_instr_from_deref(ctx, &block, deref, &instr->loc))) - return; + return false; list_move_before(&instr->entry, &block.instrs);
hlsl_cleanup_deref(deref); hlsl_src_from_node(&deref->offset, offset); -} - -/* TODO: remove when no longer needed. */ -static bool transform_deref_paths_into_offsets(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) -{ - switch(instr->type) - { - case HLSL_IR_LOAD: - replace_deref_path_with_offset(ctx, &hlsl_ir_load(instr)->src, instr); - return true;
- case HLSL_IR_STORE: - replace_deref_path_with_offset(ctx, &hlsl_ir_store(instr)->lhs, instr); - return true; - - case HLSL_IR_RESOURCE_LOAD: - replace_deref_path_with_offset(ctx, &hlsl_ir_resource_load(instr)->resource, instr); - replace_deref_path_with_offset(ctx, &hlsl_ir_resource_load(instr)->sampler, instr); - return true; - - case HLSL_IR_RESOURCE_STORE: - replace_deref_path_with_offset(ctx, &hlsl_ir_resource_store(instr)->resource, instr); - return true; - - default: - return false; - } - return false; + return true; }
/* Split uniforms into two variables representing the constant and temp @@ -600,6 +573,44 @@ bool hlsl_transform_ir(struct hlsl_ctx *ctx, bool (*func)(struct hlsl_ctx *ctx, return progress; }
+static bool transform_instr_derefs(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) +{ + bool res; + bool (*func)(struct hlsl_ctx *ctx, struct hlsl_deref *, struct hlsl_ir_node *) = context; + + switch(instr->type) + { + case HLSL_IR_LOAD: + res = func(ctx, &hlsl_ir_load(instr)->src, instr); + return res; + + case HLSL_IR_STORE: + res = func(ctx, &hlsl_ir_store(instr)->lhs, instr); + return res; + + case HLSL_IR_RESOURCE_LOAD: + res = func(ctx, &hlsl_ir_resource_load(instr)->resource, instr); + if (hlsl_ir_resource_load(instr)->sampler.var) + res |= func(ctx, &hlsl_ir_resource_load(instr)->sampler, instr); + return res; + + case HLSL_IR_RESOURCE_STORE: + res = func(ctx, &hlsl_ir_resource_store(instr)->resource, instr); + return res; + + default: + return false; + } + return false; +} + +static bool transform_derefs(struct hlsl_ctx *ctx, + bool (*func)(struct hlsl_ctx *ctx, struct hlsl_deref *, struct hlsl_ir_node *), + struct hlsl_block *block) +{ + return hlsl_transform_ir(ctx, transform_instr_derefs, block, func); +} + struct recursive_call_ctx { const struct hlsl_ir_function_decl **backtrace; @@ -3944,7 +3955,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry hlsl_transform_ir(ctx, track_object_components_usage, body, NULL);
/* TODO: move forward, remove when no longer needed */ - hlsl_transform_ir(ctx, transform_deref_paths_into_offsets, body, NULL); + transform_derefs(ctx, replace_deref_path_with_offset, body); while (hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL));
do
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 47 ++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 6a99c33e..c6cd759f 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2389,6 +2389,52 @@ static bool lower_int_abs(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void return true; }
+static bool lower_int_dot(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) +{ + struct hlsl_ir_node *arg1, *arg2, *mult, *comps[4] = {0}, *res; + struct hlsl_type *type = instr->data_type; + struct hlsl_ir_expr *expr; + unsigned int i, dimx; + + if (instr->type != HLSL_IR_EXPR) + return false; + expr = hlsl_ir_expr(instr); + + if (expr->op != HLSL_OP2_DOT) + return false; + if (type->base_type == HLSL_TYPE_FLOAT) + return false; + + arg1 = expr->operands[0].node; + arg2 = expr->operands[1].node; + assert(arg1->data_type->dimx == arg2->data_type->dimx); + dimx = arg1->data_type->dimx; + + if (!(mult = hlsl_new_binary_expr(ctx, HLSL_OP2_MUL, arg1, arg2))) + return false; + list_add_before(&instr->entry, &mult->entry); + + for (i = 0; i < dimx; ++i) + { + unsigned int s = hlsl_swizzle_from_writemask(1 << i); + + if (!(comps[i] = hlsl_new_swizzle(ctx, s, 1, mult, &instr->loc))) + return false; + list_add_before(&instr->entry, &comps[i]->entry); + } + + res = comps[0]; + for (i = 1; i < dimx; ++i) + { + if (!(res = hlsl_new_binary_expr(ctx, HLSL_OP2_ADD, res, comps[i]))) + return false; + list_add_before(&instr->entry, &res->entry); + } + + hlsl_replace_node(instr, res); + return true; +} + static bool lower_float_modulus(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) { struct hlsl_ir_node *arg1, *arg2, *mul1, *neg1, *ge, *neg2, *div, *mul2, *frc, *cond; @@ -3924,6 +3970,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry
hlsl_transform_ir(ctx, lower_narrowing_casts, body, NULL); hlsl_transform_ir(ctx, lower_casts_to_bool, body, NULL); + hlsl_transform_ir(ctx, lower_int_dot, body, NULL); hlsl_transform_ir(ctx, lower_int_division, body, NULL); hlsl_transform_ir(ctx, lower_int_modulus, body, NULL); hlsl_transform_ir(ctx, lower_int_abs, body, NULL);
From: Francisco Casas fcasas@codeweavers.com
Non-constant vector indexing is not solved with relative addressing in the register indexes because this indexation cannot be at the level of register-components.
Mathematical operations must be used instead. --- libs/vkd3d-shader/hlsl.c | 13 +++ libs/vkd3d-shader/hlsl.h | 2 + libs/vkd3d-shader/hlsl_codegen.c | 101 ++++++++++++++++++ tests/array-index-expr.shader_test | 36 +++---- tests/expr-indexing.shader_test | 22 ++-- tests/hlsl-matrix-indexing.shader_test | 6 +- .../hlsl-vector-indexing-uniform.shader_test | 4 +- 7 files changed, 150 insertions(+), 34 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index 617aef30..1cdc358d 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -1287,6 +1287,19 @@ struct hlsl_ir_load *hlsl_new_load_index(struct hlsl_ctx *ctx, const struct hlsl return load; }
+struct hlsl_ir_load *hlsl_new_load_partial_path(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, + unsigned int length, const struct vkd3d_shader_location *loc) +{ + /* This deref can only exists temporarily because it is not the real owner of its members. */ + struct hlsl_deref tmp_deref; + + assert(length <= deref->path_len); + + tmp_deref = *deref; + tmp_deref.path_len = length; + return hlsl_new_load_index(ctx, &tmp_deref, NULL, loc); +} + struct hlsl_ir_load *hlsl_new_var_load(struct hlsl_ctx *ctx, struct hlsl_ir_var *var, const struct vkd3d_shader_location *loc) { diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 6a4e314d..a59fa36d 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -1119,6 +1119,8 @@ struct hlsl_ir_load *hlsl_new_var_load(struct hlsl_ctx *ctx, struct hlsl_ir_var const struct vkd3d_shader_location *loc); struct hlsl_ir_load *hlsl_new_load_index(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, struct hlsl_ir_node *idx, const struct vkd3d_shader_location *loc); +struct hlsl_ir_load *hlsl_new_load_partial_path(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, + unsigned int length, const struct vkd3d_shader_location *loc); struct hlsl_ir_node *hlsl_new_load_component(struct hlsl_ctx *ctx, struct hlsl_block *block, const struct hlsl_deref *deref, unsigned int comp, const struct vkd3d_shader_location *loc);
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index c6cd759f..c8580086 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1992,6 +1992,106 @@ static bool remove_trivial_swizzles(struct hlsl_ctx *ctx, struct hlsl_ir_node *i return true; }
+static bool lower_nonconstant_vector_derefs(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) +{ + struct hlsl_ir_node *idx; + struct hlsl_deref *deref; + struct hlsl_type *type; + unsigned int i; + + if (instr->type != HLSL_IR_LOAD) + return false; + + deref = &hlsl_ir_load(instr)->src; + assert(deref->var); + + if (deref->path_len == 0) + return false; + + type = deref->var->data_type; + for (i = 0; i < deref->path_len - 1; ++i) + type = hlsl_get_element_type_from_path_index(ctx, type, deref->path[i].node); + + idx = deref->path[deref->path_len - 1].node; + + if (type->class == HLSL_CLASS_VECTOR && idx->type != HLSL_IR_CONSTANT) + { + struct hlsl_ir_node *eq, *swizzle, *operands[HLSL_MAX_OPERANDS] = {0}; + struct hlsl_ir_load *vector_load; + struct hlsl_ir_constant *c; + + if (!(vector_load = hlsl_new_load_partial_path(ctx, deref, deref->path_len - 1, &instr->loc))) + return false; + list_add_before(&instr->entry, &vector_load->node.entry); + + if (!(swizzle = hlsl_new_swizzle(ctx, HLSL_SWIZZLE(X, X, X, X), type->dimx, idx, &instr->loc))) + return false; + list_add_before(&instr->entry, &swizzle->entry); + + if (!(c = hlsl_new_constant(ctx, hlsl_get_vector_type(ctx, HLSL_TYPE_UINT, type->dimx), &instr->loc))) + return false; + c->value.u[0].u = 0; + c->value.u[1].u = 1; + c->value.u[2].u = 2; + c->value.u[3].u = 3; + list_add_before(&instr->entry, &c->node.entry); + + operands[0] = swizzle; + operands[1] = &c->node; + if (!(eq = hlsl_new_expr(ctx, HLSL_OP2_EQUAL, operands, + hlsl_get_vector_type(ctx, HLSL_TYPE_BOOL, type->dimx), &instr->loc))) + return false; + list_add_before(&instr->entry, &eq->entry); + + if (type->base_type == HLSL_TYPE_BOOL) + { + struct hlsl_ir_node *and, *res, *comps[4] = {0}; + + if (!(and = hlsl_new_binary_expr(ctx, HLSL_OP2_LOGIC_AND, &vector_load->node, eq))) + return false; + list_add_before(&instr->entry, &and->entry); + + for (i = 0; i < type->dimx; ++i) + { + unsigned int s = hlsl_swizzle_from_writemask(1 << i); + + if (!(comps[i] = hlsl_new_swizzle(ctx, s, 1, and, &instr->loc))) + return false; + list_add_before(&instr->entry, &comps[i]->entry); + } + + res = comps[0]; + for (i = 1; i < type->dimx; ++i) + { + if (!(res = hlsl_new_binary_expr(ctx, HLSL_OP2_LOGIC_OR, res, comps[i]))) + return false; + list_add_before(&instr->entry, &res->entry); + } + hlsl_replace_node(instr, res); + } + else + { + struct hlsl_ir_node *dot; + + if (!(eq = hlsl_new_cast(ctx, eq, type, &instr->loc))) + return false; + list_add_before(&instr->entry, &eq->entry); + + operands[0] = &vector_load->node; + operands[1] = eq; + if (!(dot = hlsl_new_expr(ctx, type->dimx == 1 ? HLSL_OP2_MUL : HLSL_OP2_DOT, operands, + instr->data_type, &instr->loc))) + return false; + list_add_before(&instr->entry, &dot->entry); + hlsl_replace_node(instr, dot); + } + + return true; + } + + return false; +} + /* Lower DIV to RCP + MUL. */ static bool lower_division(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) { @@ -3968,6 +4068,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry while (progress); hlsl_transform_ir(ctx, split_matrix_copies, body, NULL);
+ hlsl_transform_ir(ctx, lower_nonconstant_vector_derefs, body, NULL); hlsl_transform_ir(ctx, lower_narrowing_casts, body, NULL); hlsl_transform_ir(ctx, lower_casts_to_bool, body, NULL); hlsl_transform_ir(ctx, lower_int_dot, body, NULL); diff --git a/tests/array-index-expr.shader_test b/tests/array-index-expr.shader_test index 35a13f7f..e058d984 100644 --- a/tests/array-index-expr.shader_test +++ b/tests/array-index-expr.shader_test @@ -1,4 +1,4 @@ -[pixel shader todo] +[pixel shader] uniform float i;
float4 main() : SV_TARGET @@ -9,20 +9,20 @@ float4 main() : SV_TARGET
[test] uniform 0 float 0 -todo draw quad -todo probe all rgba (11.0, 11.0, 11.0, 11.0) +draw quad +probe all rgba (11.0, 11.0, 11.0, 11.0) uniform 0 float 1 -todo draw quad -todo probe all rgba (12.0, 12.0, 12.0, 12.0) +draw quad +probe all rgba (12.0, 12.0, 12.0, 12.0) uniform 0 float 2 -todo draw quad -todo probe all rgba (13.0, 13.0, 13.0, 13.0) +draw quad +probe all rgba (13.0, 13.0, 13.0, 13.0) uniform 0 float 3 -todo draw quad -todo probe all rgba (14.0, 14.0, 14.0, 14.0) +draw quad +probe all rgba (14.0, 14.0, 14.0, 14.0)
-[pixel shader todo] +[pixel shader] uniform float i;
float4 main() : SV_TARGET @@ -34,17 +34,17 @@ float4 main() : SV_TARGET
[test] uniform 0 float 0 -todo draw quad -todo probe all rgba (21.0, 1.0, 24.0, 0.0) +draw quad +probe all rgba (21.0, 1.0, 24.0, 0.0) uniform 0 float 1 -todo draw quad -todo probe all rgba (22.0, 0.0, 23.0, 1.0) +draw quad +probe all rgba (22.0, 0.0, 23.0, 1.0) uniform 0 float 2 -todo draw quad -todo probe all rgba (23.0, 1.0, 22.0, 0.0) +draw quad +probe all rgba (23.0, 1.0, 22.0, 0.0) uniform 0 float 3 -todo draw quad -todo probe all rgba (24.0, 0.0, 21.0, 1.0) +draw quad +probe all rgba (24.0, 0.0, 21.0, 1.0)
[pixel shader todo] diff --git a/tests/expr-indexing.shader_test b/tests/expr-indexing.shader_test index 83a63d67..3dcc5727 100644 --- a/tests/expr-indexing.shader_test +++ b/tests/expr-indexing.shader_test @@ -13,7 +13,7 @@ draw quad probe all rgba (8.0, 8.0, 8.0, 8.0)
-[pixel shader todo] +[pixel shader] float4 a, b; float i;
@@ -26,8 +26,8 @@ float4 main() : sv_target uniform 0 float4 1.0 2.0 3.0 4.0 uniform 4 float4 5.0 6.0 7.0 8.0 uniform 8 float 2 -todo draw quad -todo probe all rgba (10.0, 10.0, 10.0, 10.0) +draw quad +probe all rgba (10.0, 10.0, 10.0, 10.0)
[pixel shader] @@ -44,7 +44,7 @@ draw quad probe all rgba (3.0, 3.0, 3.0, 3.0)
-[pixel shader todo] +[pixel shader] float4 a; float i;
@@ -56,11 +56,11 @@ float4 main() : sv_target [test] uniform 0 float4 1.0 2.0 3.0 4.0 uniform 4 float 0 -todo draw quad -todo probe all rgba (4.0, 4.0, 4.0, 4.0) +draw quad +probe all rgba (4.0, 4.0, 4.0, 4.0) uniform 4 float 2 -todo draw quad -todo probe all rgba (1.0, 1.0, 1.0, 1.0) +draw quad +probe all rgba (1.0, 1.0, 1.0, 1.0)
[pixel shader] @@ -82,7 +82,7 @@ draw quad probe all rgba (4.0, 4.0, 4.0, 4.0)
-[pixel shader todo] +[pixel shader] float4 a; float i;
@@ -99,5 +99,5 @@ float4 main() : sv_target [test] uniform 0 float4 1.0 2.0 3.0 4.0 uniform 4 float 1 -todo draw quad -todo probe all rgba (2.0, 2.0, 2.0, 2.0) +draw quad +probe all rgba (2.0, 2.0, 2.0, 2.0) diff --git a/tests/hlsl-matrix-indexing.shader_test b/tests/hlsl-matrix-indexing.shader_test index c3d08296..a57d8fb8 100644 --- a/tests/hlsl-matrix-indexing.shader_test +++ b/tests/hlsl-matrix-indexing.shader_test @@ -108,7 +108,7 @@ draw quad probe all rgba (3.0, 4.0, 50.0, 60.0)
-[pixel shader todo] +[pixel shader] uniform float i;
float4 main() : sv_target @@ -120,8 +120,8 @@ float4 main() : sv_target
[test] uniform 0 float 2 -todo draw quad -todo probe all rgba (8, 9, 10, 11) +draw quad +probe all rgba (8, 9, 10, 11)
[pixel shader todo] diff --git a/tests/hlsl-vector-indexing-uniform.shader_test b/tests/hlsl-vector-indexing-uniform.shader_test index 968f570b..e5ffbdd0 100644 --- a/tests/hlsl-vector-indexing-uniform.shader_test +++ b/tests/hlsl-vector-indexing-uniform.shader_test @@ -1,6 +1,6 @@ % Use a uniform to prevent the compiler from optimizing.
-[pixel shader todo] +[pixel shader] uniform float i; float4 main() : SV_TARGET { @@ -12,5 +12,5 @@ float4 main() : SV_TARGET
[test] uniform 0 float 2 -todo draw quad +draw quad probe all rgba (0.5, 0.3, 0.8, 0.2)
That is a good idea! I did just that. Note that HLSL_OP2_DOT is not defined for bools, so bool vectors are a special case I had to handle with HLSL_OP2_LOGIC_AND and HLSL_OP2_LOGIC_OR.
Maybe my mathematician side is prevailing too much for Zeb's tastes, but I would just decide that `DOT` is defined for `bool` too, simply using `AND` instead of `MUL` and `OR` instead of `ADD` (unless I am missing something, you can basically use the same `lower_int_dot()` implementation, just using the appropriate `HLSL_OP2` constants). Notice that this is different of how the `dot()` intrinsic work, but that's not necessarily a problem because the `bool` dot product is immediately casted to `int` while parsing (inside `expr_common_base_type()`).
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl_codegen.c:
}
+static bool lower_int_dot(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) +{
- struct hlsl_ir_node *arg1, *arg2, *mult, *comps[4] = {0}, *res;
- struct hlsl_type *type = instr->data_type;
- struct hlsl_ir_expr *expr;
- unsigned int i, dimx;
- if (instr->type != HLSL_IR_EXPR)
return false;
- expr = hlsl_ir_expr(instr);
- if (expr->op != HLSL_OP2_DOT)
return false;
- if (type->base_type == HLSL_TYPE_FLOAT)
Maybe you should skip `HALF` too, so that the dot product operation is emitted?
On Wed May 24 11:11:01 2023 +0000, Giovanni Mascellani wrote:
That is a good idea! I did just that. Note that HLSL_OP2_DOT is not
defined for bools, so bool vectors are a special case I had to handle with HLSL_OP2_LOGIC_AND and HLSL_OP2_LOGIC_OR. Maybe my mathematician side is prevailing too much for Zeb's tastes, but I would just decide that `DOT` is defined for `bool` too, simply using `AND` instead of `MUL` and `OR` instead of `ADD` (unless I am missing something, you can basically use the same `lower_int_dot()` implementation, just using the appropriate `HLSL_OP2` constants). Notice that this is different of how the `dot()` intrinsic work, but that's not necessarily a problem because the `bool` dot product is immediately casted to `int` while parsing (inside `expr_common_base_type()`).
That would save some code, but it seemed odd to me to make the space between
`hlsl_transform_ir(ctx, lower_nonconstant_vector_derefs, body, NULL);`
and
`hlsl_transform_ir(ctx, lower_int_dot, body, NULL);`
the only place in the whole codebase where HLSL_OP2_DOT for bools can exist.
I will change it if nobody disagrees though.
On Wed May 24 11:09:03 2023 +0000, Giovanni Mascellani wrote:
Maybe you should skip `HALF` too, so that the dot product operation is emitted?
You are right, I forgot about `HALF`. Also, we may want to keep the lowering for `DOUBLE`s, but that made me realize it is perhaps better to change the name of the pass to `lower_non_float_dot()`.
On Wed May 24 15:08:36 2023 +0000, Francisco Casas wrote:
That would save some code, but it seemed odd to me to make the space between `hlsl_transform_ir(ctx, lower_nonconstant_vector_derefs, body, NULL);` and `hlsl_transform_ir(ctx, lower_int_dot, body, NULL);` the only place in the whole codebase where HLSL_OP2_DOT for bools can exist. I will change it if nobody disagrees though.
We don't have tests for this, but dot() of bools is uniquely supposed to return float.
I don't necessarily object to having HLSL_OP2_DOT mean something other than dot(), although I don't know that it's great. The idea that dot() of bools returns a bool is weird, too. I don't hate either one but I'm not sure it's better than the existing code either.
On Wed May 24 15:17:35 2023 +0000, Francisco Casas wrote:
You are right, I forgot about `HALF`. Also, we may want to keep the lowering for `DOUBLE`s, but that made me realize it is perhaps better to change the name of the pass to `lower_non_float_dot()`.
The code should probably be checking a positive condition and not a negative one, really.
Would it be better to avoid that deref path manipulation by doing this lowering in lower_index_loads()?
If not, I'd rather have a dedicated interface for "get the immediate parent of this deref". I really want to make as little code touch the "path" fields as possible.
Would it be better to avoid that deref path manipulation by doing this lowering in lower_index_loads()?
I remember to have considered this, but: - Intertwining the logic for indexing column-major matrices with the logic for non-constant vector dereferences may get messy. - I just remembered that we probably want to run this pass after folding constant expressions, to avoid adding instructions that we can't fold, e.g. just because of an unfolded cast in the index. This, unless we make the passes smart enough so they can fold `x + 0 = x` and `x * 0 = 0`.
So, I moved the pass after the constant folding. I am afraid that that also requires to go through the lower_casts_to_bool() and lower_int_dot() passes a second time.
If not, I'd rather have a dedicated interface for "get the immediate parent of this deref". I really want to make as little code touch the "path" fields as possible.
I changed `hlsl_new_load_partial_path()` to `hlsl_new_load_path_parent()`. I wonder if it will be possible to keep a low level of path manipulation for the resources support though.
On Wed May 24 17:54:12 2023 +0000, Zebediah Figura wrote:
The code should probably be checking a positive condition and not a negative one, really.
Okay, now checking for BOOL, INT, or UINT.
We don't have tests for this, but dot() of bools is uniquely supposed to return float.
Hmm, I think we are not doing that then. As Giovanni mentioned, we are making them return int.
I don't necessarily object to having HLSL_OP2_DOT mean something other than dot(), although I don't know that it's great. The idea that dot() of bools returns a bool is weird, too. I don't hate either one but I'm not sure it's better than the existing code either.
Since it allows to remove a chunk of code, I am generating the DOT for bools now. It is lowered by the following lower_int_dot() pass. I also added a comment:
```c /* Note: We may be creating a DOT for bool vectors here, which we need to lower to * LOGIC_OR + LOGIC_AND. */ ```