[PATCH v2 0/3] MR482: vkd3d-shader/hlsl: Fix infinite loops when applying copy-prop to swizzles.
Current copy-prop of swizzle instructions can result in infinite loops, as with the included test before this commit. Consider the following sequence of instructions where a load is stored in the same variable it loads: 1 : A 2 : A = @1 3 : @1.x In this case @3 would call copy_propagation_get_value() on A.x and would get @1, without detecting that that is indeed the same node it is already using as swizzle->value. So it would return true, keeping copy-prop spinning. To avoid this, it is check that the replacement instruction is not the same as the swizzle->load instruction. --- This was discovered when trying to compile shaders from [Bug 19499](https://www.codeweavers.com/support/bugs/browse?cmd=bug_edit;bug_id=19499#c7). -- v2: vkd3d-shader/hlsl: Avoid replacing with swizzles to loads that come after in copy-prop. vkd3d-shader/hlsl: Move index_instructions() up. vkd3d-shader/hlsl: Avoid replacing with the same swizzle in copy-prop. https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482
From: Francisco Casas <fcasas(a)codeweavers.com> Current copy-prop of swizzle instructions can result in infinite loops, as with the included test before this commit. Consider the following sequence of instructions where a load is stored in the same variable it loads: 1 : A 2 : A = @1 3 : @1.x In this case @3 would call copy_propagation_get_value() on A.x and would get @1, without detecting that that is indeed the same node it is already using as swizzle->value. So it would return true, keeping copy-prop spinning. To avoid this, it is check that the replacement instruction is not the same as the swizzle->load instruction. --- Makefile.am | 1 + libs/vkd3d-shader/hlsl_codegen.c | 13 ++++++++++--- tests/hlsl/hard-copy-prop.shader_test | 25 +++++++++++++++++++++++++ 3 files changed, 36 insertions(+), 3 deletions(-) create mode 100644 tests/hlsl/hard-copy-prop.shader_test diff --git a/Makefile.am b/Makefile.am index 2666194a6..5d73affa9 100644 --- a/Makefile.am +++ b/Makefile.am @@ -102,6 +102,7 @@ vkd3d_shader_tests = \ tests/hlsl/gather.shader_test \ tests/hlsl/getdimensions.shader_test \ tests/hlsl/half.shader_test \ + tests/hlsl/hard-copy-prop.shader_test \ tests/hlsl/initializer-flatten.shader_test \ tests/hlsl/initializer-implicit-array.shader_test \ tests/hlsl/initializer-invalid-arg-count.shader_test \ diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 5a70878bc..e6b5e20cd 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1433,10 +1433,11 @@ static void copy_propagation_set_value(struct copy_propagation_var_def *var_def, } static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, - const struct copy_propagation_state *state, const struct hlsl_deref *deref, + const struct copy_propagation_state *state, const struct hlsl_ir_load *load, unsigned int swizzle, struct hlsl_ir_node *instr) { const unsigned int instr_component_count = hlsl_type_component_count(instr->data_type); + const struct hlsl_deref *deref = &load->src; const struct hlsl_ir_var *var = deref->var; struct hlsl_ir_node *new_instr = NULL; unsigned int start, count, i; @@ -1465,6 +1466,12 @@ static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, ret_swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); } + /* When 'load' is not the same as 'instr', there is the possibility of 'load' itself to be + * 'new_instr'. In this case, the replacement is not necessary and we have to return false + * to avoid doing copy-prop forever. */ + if (new_instr == &load->node) + return false; + TRACE("Load from %s[%u-%u]%s propagated as instruction %p%s.\n", var->name, start, start + count, debug_hlsl_swizzle(swizzle, instr_component_count), new_instr, debug_hlsl_swizzle(ret_swizzle, instr_component_count)); @@ -1541,7 +1548,7 @@ static bool copy_propagation_transform_load(struct hlsl_ctx *ctx, if (copy_propagation_replace_with_constant_vector(ctx, state, &load->src, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) return true; - if (copy_propagation_replace_with_single_instr(ctx, state, &load->src, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) + if (copy_propagation_replace_with_single_instr(ctx, state, load, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) return true; return false; @@ -1559,7 +1566,7 @@ static bool copy_propagation_transform_swizzle(struct hlsl_ctx *ctx, if (copy_propagation_replace_with_constant_vector(ctx, state, &load->src, swizzle->swizzle, &swizzle->node)) return true; - if (copy_propagation_replace_with_single_instr(ctx, state, &load->src, swizzle->swizzle, &swizzle->node)) + if (copy_propagation_replace_with_single_instr(ctx, state, load, swizzle->swizzle, &swizzle->node)) return true; return false; diff --git a/tests/hlsl/hard-copy-prop.shader_test b/tests/hlsl/hard-copy-prop.shader_test new file mode 100644 index 000000000..fd6d588d5 --- /dev/null +++ b/tests/hlsl/hard-copy-prop.shader_test @@ -0,0 +1,25 @@ +[pixel shader] +float cond; + +float4 main() : sv_target +{ + float2 r = {1, 2}; + float2 tmp; + + // invalidate r + if (cond) + r = float2(10, 20); + + tmp = r; + r = tmp; + return r.y; +} + +[test] +uniform 0 float 0.0 +draw quad +probe all rgba (2.0, 2.0, 2.0, 2.0) +uniform 0 float 1.0 +draw quad +probe all rgba (20.0, 20.0, 20.0, 20.0) + -- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482
From: Francisco Casas <fcasas(a)codeweavers.com> --- libs/vkd3d-shader/hlsl_codegen.c | 74 ++++++++++++++++---------------- 1 file changed, 37 insertions(+), 37 deletions(-) diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index e6b5e20cd..6ff6ba755 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1203,6 +1203,43 @@ static bool lower_broadcasts(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, s return false; } +/* Allocate a unique, ordered index to each instruction, which will be used for + * computing liveness ranges. + * Index 0 means unused; index 1 means function entry, so start at 2. */ +static unsigned int index_instructions(struct hlsl_block *block, unsigned int index) +{ + struct hlsl_ir_node *instr; + + LIST_FOR_EACH_ENTRY(instr, &block->instrs, struct hlsl_ir_node, entry) + { + instr->index = index++; + + if (instr->type == HLSL_IR_IF) + { + struct hlsl_ir_if *iff = hlsl_ir_if(instr); + index = index_instructions(&iff->then_block, index); + index = index_instructions(&iff->else_block, index); + } + else if (instr->type == HLSL_IR_LOOP) + { + index = index_instructions(&hlsl_ir_loop(instr)->body, index); + hlsl_ir_loop(instr)->next_index = index; + } + else if (instr->type == HLSL_IR_SWITCH) + { + struct hlsl_ir_switch *s = hlsl_ir_switch(instr); + struct hlsl_ir_switch_case *c; + + LIST_FOR_EACH_ENTRY(c, &s->cases, struct hlsl_ir_switch_case, entry) + { + index = index_instructions(&c->body, index); + } + } + } + + return index; +} + /* * Copy propagation. The basic idea is to recognize instruction sequences of the * form: @@ -3249,42 +3286,6 @@ static bool dce(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) return false; } -/* Allocate a unique, ordered index to each instruction, which will be used for - * computing liveness ranges. */ -static unsigned int index_instructions(struct hlsl_block *block, unsigned int index) -{ - struct hlsl_ir_node *instr; - - LIST_FOR_EACH_ENTRY(instr, &block->instrs, struct hlsl_ir_node, entry) - { - instr->index = index++; - - if (instr->type == HLSL_IR_IF) - { - struct hlsl_ir_if *iff = hlsl_ir_if(instr); - index = index_instructions(&iff->then_block, index); - index = index_instructions(&iff->else_block, index); - } - else if (instr->type == HLSL_IR_LOOP) - { - index = index_instructions(&hlsl_ir_loop(instr)->body, index); - hlsl_ir_loop(instr)->next_index = index; - } - else if (instr->type == HLSL_IR_SWITCH) - { - struct hlsl_ir_switch *s = hlsl_ir_switch(instr); - struct hlsl_ir_switch_case *c; - - LIST_FOR_EACH_ENTRY(c, &s->cases, struct hlsl_ir_switch_case, entry) - { - index = index_instructions(&c->body, index); - } - } - } - - return index; -} - static void dump_function(struct rb_entry *entry, void *context) { struct hlsl_ir_function *func = RB_ENTRY_VALUE(entry, struct hlsl_ir_function, entry); @@ -3527,7 +3528,6 @@ static void compute_liveness(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl struct hlsl_scope *scope; struct hlsl_ir_var *var; - /* Index 0 means unused; index 1 means function entry, so start at 2. */ index_instructions(&entry_func->body, 2); LIST_FOR_EACH_ENTRY(scope, &ctx->scopes, struct hlsl_scope, entry) -- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482
From: Francisco Casas <fcasas(a)codeweavers.com> There are infinite loops that can still happen in copy-prop for swizzles, like in the new added test. The previous solution is not strong enough to solve them. Consider the following sequence of instructions: 1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @1.x Current copy-prop would replace 5 so it points to @3 now: 1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @3.x But in the next iteration it would make it point back to @1, keeping it spinning infinitively. The solution is to index the instructions and don't replace the swizzle if the new load happens after the current load. --- libs/vkd3d-shader/hlsl_codegen.c | 20 +++++++++++++------- tests/hlsl/hard-copy-prop.shader_test | 24 ++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 7 deletions(-) diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 6ff6ba755..51f333516 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1204,7 +1204,7 @@ static bool lower_broadcasts(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, s } /* Allocate a unique, ordered index to each instruction, which will be used for - * computing liveness ranges. + * copy propagation and computing liveness ranges. * Index 0 means unused; index 1 means function entry, so start at 2. */ static unsigned int index_instructions(struct hlsl_block *block, unsigned int index) { @@ -1503,15 +1503,15 @@ static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, ret_swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); } - /* When 'load' is not the same as 'instr', there is the possibility of 'load' itself to be - * 'new_instr'. In this case, the replacement is not necessary and we have to return false - * to avoid doing copy-prop forever. */ - if (new_instr == &load->node) + /* When 'load' is not the same as 'instr', there is the possibility of 'load' to be before in + * the program than 'new_instr', or even be equal to 'new_instr'. In this case, the replacement + * is not helpful, so we have to return false to avoid doing copy-prop forever. */ + if (load->node.index <= new_instr->index) return false; - TRACE("Load from %s[%u-%u]%s propagated as instruction %p%s.\n", + TRACE("Load from %s[%u-%u]%s for instruction %p propagated as instruction %p%s.\n", var->name, start, start + count, debug_hlsl_swizzle(swizzle, instr_component_count), - new_instr, debug_hlsl_swizzle(ret_swizzle, instr_component_count)); + instr, new_instr, debug_hlsl_swizzle(ret_swizzle, instr_component_count)); if (instr->data_type->class != HLSL_CLASS_OBJECT) { @@ -1891,6 +1891,8 @@ bool hlsl_copy_propagation_execute(struct hlsl_ctx *ctx, struct hlsl_block *bloc struct copy_propagation_state state; bool progress; + index_instructions(block, 2); + copy_propagation_state_init(ctx, &state, NULL); progress = copy_propagation_transform_block(ctx, block, &state); @@ -4884,6 +4886,10 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry hlsl_transform_ir(ctx, fold_redundant_casts, body, NULL); do { + + if (TRACE_ON()) + rb_for_each_entry(&ctx->functions, dump_function, ctx); + progress = hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL); progress |= hlsl_transform_ir(ctx, hlsl_fold_constant_swizzles, body, NULL); progress |= hlsl_copy_propagation_execute(ctx, body); diff --git a/tests/hlsl/hard-copy-prop.shader_test b/tests/hlsl/hard-copy-prop.shader_test index fd6d588d5..5b8604b82 100644 --- a/tests/hlsl/hard-copy-prop.shader_test +++ b/tests/hlsl/hard-copy-prop.shader_test @@ -23,3 +23,27 @@ uniform 0 float 1.0 draw quad probe all rgba (20.0, 20.0, 20.0, 20.0) + +[pixel shader] +float cond; + +float4 main() : sv_target +{ + float2 r = {3, 4}; + + // invalidate r + if (cond) + r = float2(30, 40); + + r = r; + r = float2(1, r.y); + + return float4(r, 0, 0); +} +[test] +uniform 0 float 0.0 +draw quad +probe all rgba (1.0, 4.0, 0.0, 0.0) +uniform 0 float 1.0 +draw quad +probe all rgba (1.0, 40.0, 0.0, 0.0) -- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482
As previously mentioned it is possible to build a more complicated test that still can make copy-prop spinning after 1/1. Consider the following sequence of instructions: 1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @1.x Current copy-prop would replace 5 so it points to @3 now: 1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @3.x But in the next iteration it would make it point back to @1, keeping it spinning infinitively. So I improved the solution appending 2/3 and 3/3, and also added a stronger test in 3/3. While 1/3 could be squashed into 3/3, I think it is good to keep it for documentation purposes. is to index the instructions and don't replace the swizzle if the new load happens after the current load. 1/1 wasn't enough to solve all copy-propagation bugs I improved the solution in 2/3 and 3/3. Now, the more obscure bug happens -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52713
``` 1 : A 2 : A = @1 3 : @1.x ```
In this case @3 would call copy_propagation_get_value() on A.x and would get @1, without detecting that that is indeed the same node it is already using as swizzle->value. So it would return true, keeping copy-prop spinning.
Without thinking it through fully, that sounds like copy-prop is even more broken. Fundamentally we shouldn't be propagating a store that happens *after* the load we're reaching through. After all, given ``` 1: A 2: idk lol 3: A = @2 4: @1.x ``` the store at @3 should prevent us from replacing @4 with @2. Either that's not happening now, in which case we have bigger problems, or it is happening now, in which case why is that same logic not preventing the infinite loop? -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52784
Without thinking it through fully, that sounds like copy-prop is even more broken. Fundamentally we shouldn't be propagating a store that happens *after* the load we're reaching through.
Yep, at some point I realized that and the current version of the MR handles the problem. Check my other comment in the [MR](https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52713) or the commit message on 3/3. -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52871
On Mon Nov 20 13:05:32 2023 +0000, Francisco Casas wrote:
Without thinking it through fully, that sounds like copy-prop is even more broken. Fundamentally we shouldn't be propagating a store that happens *after* the load we're reaching through. Yep, at some point I realized that and the current version of the MR handles the problem. Check my other comment in the [MR](https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52713) or the commit message on 3/3. Well, it still begs the question of why 1/3 is there then, especially with the commit message that it has.
-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52912
Zebediah Figura (@zfigura) commented about libs/vkd3d-shader/hlsl_codegen.c:
lower_ir(ctx, lower_narrowing_casts, body); lower_ir(ctx, lower_casts_to_bool, body); lower_ir(ctx, lower_int_dot, body); lower_ir(ctx, lower_int_division, body); lower_ir(ctx, lower_int_modulus, body); lower_ir(ctx, lower_int_abs, body); lower_ir(ctx, lower_float_modulus, body); hlsl_transform_ir(ctx, fold_redundant_casts, body, NULL); do { + + if (TRACE_ON()) + rb_for_each_entry(&ctx->functions, dump_function, ctx); + progress = hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL);
Is this left over from debugging? -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52913
Zebediah Figura (@zfigura) commented about libs/vkd3d-shader/hlsl_codegen.c:
ret_swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); }
- /* When 'load' is not the same as 'instr', there is the possibility of 'load' itself to be - * 'new_instr'. In this case, the replacement is not necessary and we have to return false - * to avoid doing copy-prop forever. */ - if (new_instr == &load->node) + /* When 'load' is not the same as 'instr', there is the possibility of 'load' to be before in + * the program than 'new_instr', or even be equal to 'new_instr'. In this case, the replacement + * is not helpful, so we have to return false to avoid doing copy-prop forever. */ + if (load->node.index <= new_instr->index)
I fear this isn't going to be enough. E.g. does this handle loops correctly? Even if it is enough I feel like the logic is fragile enough that we should find some other way. Should we throw out this pass altogether? We have tests that depend on it, but I think those tests are probably very artificial. Alternatively: When we encounter a load, we record, in the copy_propagation_state, an array of copy_propagation_value's for that load. Then in copy_propagation_transform_swizzle(), instead of calling copy_propagation_get_value(), we use that value array. That way we are looking up the "live" values at the time of the load, rather than later after things may have been invalidated. -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/482#note_52916
participants (3)
-
Francisco Casas -
Francisco Casas (@fcasas) -
Zebediah Figura (@zfigura)