Current copy-prop of swizzle instructions can result in infinite loops, as with the included test before this commit.
Consider the following sequence of instructions where a load is stored in the same variable it loads:
1 : A 2 : A = @1 3 : @1.x
In this case @3 would call copy_propagation_get_value() on A.x and would get @1, without detecting that that is indeed the same node it is already using as swizzle->value. So it would return true, keeping copy-prop spinning.
To avoid this, it is check that the replacement instruction is not the same as the swizzle->load instruction.
---
This was discovered when trying to compile shaders from [Bug 19499](https://www.codeweavers.com/support/bugs/browse?cmd=bug_edit;bug_id=19499#c7).
-- v2: vkd3d-shader/hlsl: Avoid replacing with swizzles to loads that come after in copy-prop. vkd3d-shader/hlsl: Move index_instructions() up. vkd3d-shader/hlsl: Avoid replacing with the same swizzle in copy-prop.
From: Francisco Casas fcasas@codeweavers.com
Current copy-prop of swizzle instructions can result in infinite loops, as with the included test before this commit.
Consider the following sequence of instructions where a load is stored in the same variable it loads:
1 : A 2 : A = @1 3 : @1.x
In this case @3 would call copy_propagation_get_value() on A.x and would get @1, without detecting that that is indeed the same node it is already using as swizzle->value. So it would return true, keeping copy-prop spinning.
To avoid this, it is check that the replacement instruction is not the same as the swizzle->load instruction. --- Makefile.am | 1 + libs/vkd3d-shader/hlsl_codegen.c | 13 ++++++++++--- tests/hlsl/hard-copy-prop.shader_test | 25 +++++++++++++++++++++++++ 3 files changed, 36 insertions(+), 3 deletions(-) create mode 100644 tests/hlsl/hard-copy-prop.shader_test
diff --git a/Makefile.am b/Makefile.am index 2666194a6..5d73affa9 100644 --- a/Makefile.am +++ b/Makefile.am @@ -102,6 +102,7 @@ vkd3d_shader_tests = \ tests/hlsl/gather.shader_test \ tests/hlsl/getdimensions.shader_test \ tests/hlsl/half.shader_test \ + tests/hlsl/hard-copy-prop.shader_test \ tests/hlsl/initializer-flatten.shader_test \ tests/hlsl/initializer-implicit-array.shader_test \ tests/hlsl/initializer-invalid-arg-count.shader_test \ diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 5a70878bc..e6b5e20cd 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1433,10 +1433,11 @@ static void copy_propagation_set_value(struct copy_propagation_var_def *var_def, }
static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, - const struct copy_propagation_state *state, const struct hlsl_deref *deref, + const struct copy_propagation_state *state, const struct hlsl_ir_load *load, unsigned int swizzle, struct hlsl_ir_node *instr) { const unsigned int instr_component_count = hlsl_type_component_count(instr->data_type); + const struct hlsl_deref *deref = &load->src; const struct hlsl_ir_var *var = deref->var; struct hlsl_ir_node *new_instr = NULL; unsigned int start, count, i; @@ -1465,6 +1466,12 @@ static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, ret_swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); }
+ /* When 'load' is not the same as 'instr', there is the possibility of 'load' itself to be + * 'new_instr'. In this case, the replacement is not necessary and we have to return false + * to avoid doing copy-prop forever. */ + if (new_instr == &load->node) + return false; + TRACE("Load from %s[%u-%u]%s propagated as instruction %p%s.\n", var->name, start, start + count, debug_hlsl_swizzle(swizzle, instr_component_count), new_instr, debug_hlsl_swizzle(ret_swizzle, instr_component_count)); @@ -1541,7 +1548,7 @@ static bool copy_propagation_transform_load(struct hlsl_ctx *ctx, if (copy_propagation_replace_with_constant_vector(ctx, state, &load->src, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) return true;
- if (copy_propagation_replace_with_single_instr(ctx, state, &load->src, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) + if (copy_propagation_replace_with_single_instr(ctx, state, load, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) return true;
return false; @@ -1559,7 +1566,7 @@ static bool copy_propagation_transform_swizzle(struct hlsl_ctx *ctx, if (copy_propagation_replace_with_constant_vector(ctx, state, &load->src, swizzle->swizzle, &swizzle->node)) return true;
- if (copy_propagation_replace_with_single_instr(ctx, state, &load->src, swizzle->swizzle, &swizzle->node)) + if (copy_propagation_replace_with_single_instr(ctx, state, load, swizzle->swizzle, &swizzle->node)) return true;
return false; diff --git a/tests/hlsl/hard-copy-prop.shader_test b/tests/hlsl/hard-copy-prop.shader_test new file mode 100644 index 000000000..fd6d588d5 --- /dev/null +++ b/tests/hlsl/hard-copy-prop.shader_test @@ -0,0 +1,25 @@ +[pixel shader] +float cond; + +float4 main() : sv_target +{ + float2 r = {1, 2}; + float2 tmp; + + // invalidate r + if (cond) + r = float2(10, 20); + + tmp = r; + r = tmp; + return r.y; +} + +[test] +uniform 0 float 0.0 +draw quad +probe all rgba (2.0, 2.0, 2.0, 2.0) +uniform 0 float 1.0 +draw quad +probe all rgba (20.0, 20.0, 20.0, 20.0) +
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 74 ++++++++++++++++---------------- 1 file changed, 37 insertions(+), 37 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index e6b5e20cd..6ff6ba755 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1203,6 +1203,43 @@ static bool lower_broadcasts(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, s return false; }
+/* Allocate a unique, ordered index to each instruction, which will be used for + * computing liveness ranges. + * Index 0 means unused; index 1 means function entry, so start at 2. */ +static unsigned int index_instructions(struct hlsl_block *block, unsigned int index) +{ + struct hlsl_ir_node *instr; + + LIST_FOR_EACH_ENTRY(instr, &block->instrs, struct hlsl_ir_node, entry) + { + instr->index = index++; + + if (instr->type == HLSL_IR_IF) + { + struct hlsl_ir_if *iff = hlsl_ir_if(instr); + index = index_instructions(&iff->then_block, index); + index = index_instructions(&iff->else_block, index); + } + else if (instr->type == HLSL_IR_LOOP) + { + index = index_instructions(&hlsl_ir_loop(instr)->body, index); + hlsl_ir_loop(instr)->next_index = index; + } + else if (instr->type == HLSL_IR_SWITCH) + { + struct hlsl_ir_switch *s = hlsl_ir_switch(instr); + struct hlsl_ir_switch_case *c; + + LIST_FOR_EACH_ENTRY(c, &s->cases, struct hlsl_ir_switch_case, entry) + { + index = index_instructions(&c->body, index); + } + } + } + + return index; +} + /* * Copy propagation. The basic idea is to recognize instruction sequences of the * form: @@ -3249,42 +3286,6 @@ static bool dce(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) return false; }
-/* Allocate a unique, ordered index to each instruction, which will be used for - * computing liveness ranges. */ -static unsigned int index_instructions(struct hlsl_block *block, unsigned int index) -{ - struct hlsl_ir_node *instr; - - LIST_FOR_EACH_ENTRY(instr, &block->instrs, struct hlsl_ir_node, entry) - { - instr->index = index++; - - if (instr->type == HLSL_IR_IF) - { - struct hlsl_ir_if *iff = hlsl_ir_if(instr); - index = index_instructions(&iff->then_block, index); - index = index_instructions(&iff->else_block, index); - } - else if (instr->type == HLSL_IR_LOOP) - { - index = index_instructions(&hlsl_ir_loop(instr)->body, index); - hlsl_ir_loop(instr)->next_index = index; - } - else if (instr->type == HLSL_IR_SWITCH) - { - struct hlsl_ir_switch *s = hlsl_ir_switch(instr); - struct hlsl_ir_switch_case *c; - - LIST_FOR_EACH_ENTRY(c, &s->cases, struct hlsl_ir_switch_case, entry) - { - index = index_instructions(&c->body, index); - } - } - } - - return index; -} - static void dump_function(struct rb_entry *entry, void *context) { struct hlsl_ir_function *func = RB_ENTRY_VALUE(entry, struct hlsl_ir_function, entry); @@ -3527,7 +3528,6 @@ static void compute_liveness(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl struct hlsl_scope *scope; struct hlsl_ir_var *var;
- /* Index 0 means unused; index 1 means function entry, so start at 2. */ index_instructions(&entry_func->body, 2);
LIST_FOR_EACH_ENTRY(scope, &ctx->scopes, struct hlsl_scope, entry)
From: Francisco Casas fcasas@codeweavers.com
There are infinite loops that can still happen in copy-prop for swizzles, like in the new added test. The previous solution is not strong enough to solve them.
Consider the following sequence of instructions:
1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @1.x
Current copy-prop would replace 5 so it points to @3 now:
1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @3.x
But in the next iteration it would make it point back to @1, keeping it spinning infinitively.
The solution is to index the instructions and don't replace the swizzle if the new load happens after the current load. --- libs/vkd3d-shader/hlsl_codegen.c | 20 +++++++++++++------- tests/hlsl/hard-copy-prop.shader_test | 24 ++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 7 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 6ff6ba755..51f333516 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1204,7 +1204,7 @@ static bool lower_broadcasts(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, s }
/* Allocate a unique, ordered index to each instruction, which will be used for - * computing liveness ranges. + * copy propagation and computing liveness ranges. * Index 0 means unused; index 1 means function entry, so start at 2. */ static unsigned int index_instructions(struct hlsl_block *block, unsigned int index) { @@ -1503,15 +1503,15 @@ static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, ret_swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); }
- /* When 'load' is not the same as 'instr', there is the possibility of 'load' itself to be - * 'new_instr'. In this case, the replacement is not necessary and we have to return false - * to avoid doing copy-prop forever. */ - if (new_instr == &load->node) + /* When 'load' is not the same as 'instr', there is the possibility of 'load' to be before in + * the program than 'new_instr', or even be equal to 'new_instr'. In this case, the replacement + * is not helpful, so we have to return false to avoid doing copy-prop forever. */ + if (load->node.index <= new_instr->index) return false;
- TRACE("Load from %s[%u-%u]%s propagated as instruction %p%s.\n", + TRACE("Load from %s[%u-%u]%s for instruction %p propagated as instruction %p%s.\n", var->name, start, start + count, debug_hlsl_swizzle(swizzle, instr_component_count), - new_instr, debug_hlsl_swizzle(ret_swizzle, instr_component_count)); + instr, new_instr, debug_hlsl_swizzle(ret_swizzle, instr_component_count));
if (instr->data_type->class != HLSL_CLASS_OBJECT) { @@ -1891,6 +1891,8 @@ bool hlsl_copy_propagation_execute(struct hlsl_ctx *ctx, struct hlsl_block *bloc struct copy_propagation_state state; bool progress;
+ index_instructions(block, 2); + copy_propagation_state_init(ctx, &state, NULL);
progress = copy_propagation_transform_block(ctx, block, &state); @@ -4884,6 +4886,10 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry hlsl_transform_ir(ctx, fold_redundant_casts, body, NULL); do { + + if (TRACE_ON()) + rb_for_each_entry(&ctx->functions, dump_function, ctx); + progress = hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL); progress |= hlsl_transform_ir(ctx, hlsl_fold_constant_swizzles, body, NULL); progress |= hlsl_copy_propagation_execute(ctx, body); diff --git a/tests/hlsl/hard-copy-prop.shader_test b/tests/hlsl/hard-copy-prop.shader_test index fd6d588d5..5b8604b82 100644 --- a/tests/hlsl/hard-copy-prop.shader_test +++ b/tests/hlsl/hard-copy-prop.shader_test @@ -23,3 +23,27 @@ uniform 0 float 1.0 draw quad probe all rgba (20.0, 20.0, 20.0, 20.0)
+ +[pixel shader] +float cond; + +float4 main() : sv_target +{ + float2 r = {3, 4}; + + // invalidate r + if (cond) + r = float2(30, 40); + + r = r; + r = float2(1, r.y); + + return float4(r, 0, 0); +} +[test] +uniform 0 float 0.0 +draw quad +probe all rgba (1.0, 4.0, 0.0, 0.0) +uniform 0 float 1.0 +draw quad +probe all rgba (1.0, 40.0, 0.0, 0.0)
As previously mentioned it is possible to build a more complicated test that still can make copy-prop spinning after 1/1.
Consider the following sequence of instructions:
1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @1.x
Current copy-prop would replace 5 so it points to @3 now:
1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @3.x
But in the next iteration it would make it point back to @1, keeping it spinning infinitively.
So I improved the solution appending 2/3 and 3/3, and also added a stronger test in 3/3.
While 1/3 could be squashed into 3/3, I think it is good to keep it for documentation purposes.
is to index the instructions and don't replace the swizzle if the new load happens after the current load.
1/1 wasn't enough to solve all copy-propagation bugs I improved the solution in 2/3 and 3/3.
Now, the more obscure bug happens
1 : A 2 : A = @1 3 : @1.x
In this case @3 would call copy_propagation_get_value() on A.x and would get @1, without detecting that that is indeed the same node it is already using as swizzle->value. So it would return true, keeping copy-prop spinning.
Without thinking it through fully, that sounds like copy-prop is even more broken. Fundamentally we shouldn't be propagating a store that happens *after* the load we're reaching through.
After all, given
``` 1: A 2: idk lol 3: A = @2 4: @1.x ```
the store at @3 should prevent us from replacing @4 with @2. Either that's not happening now, in which case we have bigger problems, or it is happening now, in which case why is that same logic not preventing the infinite loop?
Without thinking it through fully, that sounds like copy-prop is even more broken. Fundamentally we shouldn't be propagating a store that happens *after* the load we're reaching through.
Yep, at some point I realized that and the current version of the MR handles the problem. Check my other comment in the [MR](https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52713) or the commit message on 3/3.
On Mon Nov 20 13:05:32 2023 +0000, Francisco Casas wrote:
Without thinking it through fully, that sounds like copy-prop is even
more broken. Fundamentally we shouldn't be propagating a store that happens *after* the load we're reaching through. Yep, at some point I realized that and the current version of the MR handles the problem. Check my other comment in the [MR](https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52713) or the commit message on 3/3.
Well, it still begs the question of why 1/3 is there then, especially with the commit message that it has.
Zebediah Figura (@zfigura) commented about libs/vkd3d-shader/hlsl_codegen.c:
lower_ir(ctx, lower_narrowing_casts, body); lower_ir(ctx, lower_casts_to_bool, body); lower_ir(ctx, lower_int_dot, body); lower_ir(ctx, lower_int_division, body); lower_ir(ctx, lower_int_modulus, body); lower_ir(ctx, lower_int_abs, body); lower_ir(ctx, lower_float_modulus, body); hlsl_transform_ir(ctx, fold_redundant_casts, body, NULL); do {
if (TRACE_ON())
rb_for_each_entry(&ctx->functions, dump_function, ctx);
progress = hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL);
Is this left over from debugging?
Zebediah Figura (@zfigura) commented about libs/vkd3d-shader/hlsl_codegen.c:
ret_swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); }
- /* When 'load' is not the same as 'instr', there is the possibility of 'load' itself to be
* 'new_instr'. In this case, the replacement is not necessary and we have to return false
* to avoid doing copy-prop forever. */
- if (new_instr == &load->node)
- /* When 'load' is not the same as 'instr', there is the possibility of 'load' to be before in
* the program than 'new_instr', or even be equal to 'new_instr'. In this case, the replacement
* is not helpful, so we have to return false to avoid doing copy-prop forever. */
- if (load->node.index <= new_instr->index)
I fear this isn't going to be enough. E.g. does this handle loops correctly?
Even if it is enough I feel like the logic is fragile enough that we should find some other way.
Should we throw out this pass altogether? We have tests that depend on it, but I think those tests are probably very artificial.
Alternatively: When we encounter a load, we record, in the copy_propagation_state, an array of copy_propagation_value's for that load. Then in copy_propagation_transform_swizzle(), instead of calling copy_propagation_get_value(), we use that value array. That way we are looking up the "live" values at the time of the load, rather than later after things may have been invalidated.