Current copy-prop of swizzle instructions can result in infinite loops, as with the included test before this commit.
Consider the following sequence of instructions where a load is stored in the same variable it loads:
1 : A 2 : A = @1 3 : @1.x
In this case @3 would call copy_propagation_get_value() on A.x and would get @1, without detecting that that is indeed the same node it is already using as swizzle->value. So it would return true, keeping copy-prop spinning.
To avoid this, it is check that the replacement instruction is not the same as the swizzle->load instruction.
---
This was discovered when trying to compile shaders from [Bug 19499](https://www.codeweavers.com/support/bugs/browse?cmd=bug_edit;bug_id=19499#c7).
-- v3: vkd3d-shader/hlsl: Avoid replacing with the same swizzle in copy-prop. vkd3d-shader/hlsl: Move index_instructions() up.
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 74 ++++++++++++++++---------------- 1 file changed, 37 insertions(+), 37 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 5a70878bc..3d39fb921 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1203,6 +1203,43 @@ static bool lower_broadcasts(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, s return false; }
+/* Allocate a unique, ordered index to each instruction, which will be used for + * computing liveness ranges. + * Index 0 means unused; index 1 means function entry, so start at 2. */ +static unsigned int index_instructions(struct hlsl_block *block, unsigned int index) +{ + struct hlsl_ir_node *instr; + + LIST_FOR_EACH_ENTRY(instr, &block->instrs, struct hlsl_ir_node, entry) + { + instr->index = index++; + + if (instr->type == HLSL_IR_IF) + { + struct hlsl_ir_if *iff = hlsl_ir_if(instr); + index = index_instructions(&iff->then_block, index); + index = index_instructions(&iff->else_block, index); + } + else if (instr->type == HLSL_IR_LOOP) + { + index = index_instructions(&hlsl_ir_loop(instr)->body, index); + hlsl_ir_loop(instr)->next_index = index; + } + else if (instr->type == HLSL_IR_SWITCH) + { + struct hlsl_ir_switch *s = hlsl_ir_switch(instr); + struct hlsl_ir_switch_case *c; + + LIST_FOR_EACH_ENTRY(c, &s->cases, struct hlsl_ir_switch_case, entry) + { + index = index_instructions(&c->body, index); + } + } + } + + return index; +} + /* * Copy propagation. The basic idea is to recognize instruction sequences of the * form: @@ -3242,42 +3279,6 @@ static bool dce(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) return false; }
-/* Allocate a unique, ordered index to each instruction, which will be used for - * computing liveness ranges. */ -static unsigned int index_instructions(struct hlsl_block *block, unsigned int index) -{ - struct hlsl_ir_node *instr; - - LIST_FOR_EACH_ENTRY(instr, &block->instrs, struct hlsl_ir_node, entry) - { - instr->index = index++; - - if (instr->type == HLSL_IR_IF) - { - struct hlsl_ir_if *iff = hlsl_ir_if(instr); - index = index_instructions(&iff->then_block, index); - index = index_instructions(&iff->else_block, index); - } - else if (instr->type == HLSL_IR_LOOP) - { - index = index_instructions(&hlsl_ir_loop(instr)->body, index); - hlsl_ir_loop(instr)->next_index = index; - } - else if (instr->type == HLSL_IR_SWITCH) - { - struct hlsl_ir_switch *s = hlsl_ir_switch(instr); - struct hlsl_ir_switch_case *c; - - LIST_FOR_EACH_ENTRY(c, &s->cases, struct hlsl_ir_switch_case, entry) - { - index = index_instructions(&c->body, index); - } - } - } - - return index; -} - static void dump_function(struct rb_entry *entry, void *context) { struct hlsl_ir_function *func = RB_ENTRY_VALUE(entry, struct hlsl_ir_function, entry); @@ -3520,7 +3521,6 @@ static void compute_liveness(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl struct hlsl_scope *scope; struct hlsl_ir_var *var;
- /* Index 0 means unused; index 1 means function entry, so start at 2. */ index_instructions(&entry_func->body, 2);
LIST_FOR_EACH_ENTRY(scope, &ctx->scopes, struct hlsl_scope, entry)
From: Francisco Casas fcasas@codeweavers.com
Current copy-prop of swizzle instructions can result in infinite loops, as with the included tests this commit.
Consider the following sequence of instructions:
1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @1.x
Current copy-prop would replace 5 so it points to @3 now:
1 : A 2 : B = @1 3 : B 4 : A = @3 5 : @3.x
But in the next iteration it would make it point back to @1, keeping it spinning infinitively.
The solution is to index the instructions and don't replace the swizzle if the new load happens after the current load. --- Makefile.am | 1 + libs/vkd3d-shader/hlsl_codegen.c | 21 ++++++++---- tests/hlsl/hard-copy-prop.shader_test | 49 +++++++++++++++++++++++++++ 3 files changed, 65 insertions(+), 6 deletions(-) create mode 100644 tests/hlsl/hard-copy-prop.shader_test
diff --git a/Makefile.am b/Makefile.am index 2666194a6..5d73affa9 100644 --- a/Makefile.am +++ b/Makefile.am @@ -102,6 +102,7 @@ vkd3d_shader_tests = \ tests/hlsl/gather.shader_test \ tests/hlsl/getdimensions.shader_test \ tests/hlsl/half.shader_test \ + tests/hlsl/hard-copy-prop.shader_test \ tests/hlsl/initializer-flatten.shader_test \ tests/hlsl/initializer-implicit-array.shader_test \ tests/hlsl/initializer-invalid-arg-count.shader_test \ diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 3d39fb921..c95bc8d1c 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1204,7 +1204,7 @@ static bool lower_broadcasts(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, s }
/* Allocate a unique, ordered index to each instruction, which will be used for - * computing liveness ranges. + * copy propagation and computing liveness ranges. * Index 0 means unused; index 1 means function entry, so start at 2. */ static unsigned int index_instructions(struct hlsl_block *block, unsigned int index) { @@ -1470,10 +1470,11 @@ static void copy_propagation_set_value(struct copy_propagation_var_def *var_def, }
static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, - const struct copy_propagation_state *state, const struct hlsl_deref *deref, + const struct copy_propagation_state *state, const struct hlsl_ir_load *load, unsigned int swizzle, struct hlsl_ir_node *instr) { const unsigned int instr_component_count = hlsl_type_component_count(instr->data_type); + const struct hlsl_deref *deref = &load->src; const struct hlsl_ir_var *var = deref->var; struct hlsl_ir_node *new_instr = NULL; unsigned int start, count, i; @@ -1502,9 +1503,15 @@ static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, ret_swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); }
- TRACE("Load from %s[%u-%u]%s propagated as instruction %p%s.\n", + /* When 'load' is not the same as 'instr', there is the possibility of 'load' to be before in + * the program than 'new_instr', or even be equal to 'new_instr'. In this case, the replacement + * is not helpful, so we have to return false to avoid doing copy-prop forever. */ + if (load->node.index <= new_instr->index) + return false; + + TRACE("Load from %s[%u-%u]%s for instruction %p propagated as instruction %p%s.\n", var->name, start, start + count, debug_hlsl_swizzle(swizzle, instr_component_count), - new_instr, debug_hlsl_swizzle(ret_swizzle, instr_component_count)); + instr, new_instr, debug_hlsl_swizzle(ret_swizzle, instr_component_count));
if (instr->data_type->class != HLSL_CLASS_OBJECT) { @@ -1578,7 +1585,7 @@ static bool copy_propagation_transform_load(struct hlsl_ctx *ctx, if (copy_propagation_replace_with_constant_vector(ctx, state, &load->src, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) return true;
- if (copy_propagation_replace_with_single_instr(ctx, state, &load->src, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) + if (copy_propagation_replace_with_single_instr(ctx, state, load, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) return true;
return false; @@ -1596,7 +1603,7 @@ static bool copy_propagation_transform_swizzle(struct hlsl_ctx *ctx, if (copy_propagation_replace_with_constant_vector(ctx, state, &load->src, swizzle->swizzle, &swizzle->node)) return true;
- if (copy_propagation_replace_with_single_instr(ctx, state, &load->src, swizzle->swizzle, &swizzle->node)) + if (copy_propagation_replace_with_single_instr(ctx, state, load, swizzle->swizzle, &swizzle->node)) return true;
return false; @@ -1884,6 +1891,8 @@ bool hlsl_copy_propagation_execute(struct hlsl_ctx *ctx, struct hlsl_block *bloc struct copy_propagation_state state; bool progress;
+ index_instructions(block, 2); + copy_propagation_state_init(ctx, &state, NULL);
progress = copy_propagation_transform_block(ctx, block, &state); diff --git a/tests/hlsl/hard-copy-prop.shader_test b/tests/hlsl/hard-copy-prop.shader_test new file mode 100644 index 000000000..5b8604b82 --- /dev/null +++ b/tests/hlsl/hard-copy-prop.shader_test @@ -0,0 +1,49 @@ +[pixel shader] +float cond; + +float4 main() : sv_target +{ + float2 r = {1, 2}; + float2 tmp; + + // invalidate r + if (cond) + r = float2(10, 20); + + tmp = r; + r = tmp; + return r.y; +} + +[test] +uniform 0 float 0.0 +draw quad +probe all rgba (2.0, 2.0, 2.0, 2.0) +uniform 0 float 1.0 +draw quad +probe all rgba (20.0, 20.0, 20.0, 20.0) + + +[pixel shader] +float cond; + +float4 main() : sv_target +{ + float2 r = {3, 4}; + + // invalidate r + if (cond) + r = float2(30, 40); + + r = r; + r = float2(1, r.y); + + return float4(r, 0, 0); +} +[test] +uniform 0 float 0.0 +draw quad +probe all rgba (1.0, 4.0, 0.0, 0.0) +uniform 0 float 1.0 +draw quad +probe all rgba (1.0, 40.0, 0.0, 0.0)
On Mon Nov 20 17:18:40 2023 +0000, Zebediah Figura wrote:
Is this left over from debugging?
Yes. I will remove that.
On Mon Nov 20 17:16:52 2023 +0000, Zebediah Figura wrote:
Well, it still begs the question of why 1/3 is there then, especially with the commit message that it has.
I kinda wanted to keep it there to document a simpler problem that may happen. But okay, I squashed it.
I fear this isn't going to be enough. E.g. does this handle loops correctly?
I don't see a reason why it shouldn't cover loops, since copy_propagation_invalidate_from_block() is called right at the beginning of copy_propagation_process_loop(), so any values that are written after the swizzle, in the loop, should be invalidated.
Even if it is enough I feel like the logic is fragile enough that we should find some other way.
Should we throw out this pass altogether? We have tests that depend on it, but I think those tests are probably very artificial.
I don't think we want to get rid of copy_propagation_transform_swizzle() as a whole. We can see it as composed of two parts:
a) copy_propagation_replace_with_constant_vector() b) copy_propagation_replace_with_single_instr()
We want (a) at least since there are many cases where we index arrays with swizzles like `arr[a.x]` and we want to avoid relative addressing (for SM1 for instance). Also for instructions that expect constant values like the gather offsets.
However, we could argue to get rid of the (b) part. No test failures appear after we remove it.
I still fear that (b) may be required to make (a) work in some cases such as: ``` @1: Q @2: A.x = 1.0 @3: A @4: B = @3 @5: B.y = @1 @6: B @7: @6.x // <-- we want to turn this into a 1.0 constant ```
Alternatively: When we encounter a load, we record, in the copy_propagation_state, an array of copy_propagation_value's for that load. Then in copy_propagation_transform_swizzle(), instead of calling copy_propagation_get_value(), we use that value array. That way we are looking up the "live" values at the time of the load, rather than later after things may have been invalidated.
I agree that this approach of storing the "log" (if you may) for each value will probably be required to solve cases such as: ``` @1: Q @2: B.x = 1.0 @3: B @4: B.x = @1 @5: @3.x // <-- we want to turn this into a 1.0 constant ```
the only problem I have is that an implementation may be complicated enough for us not to want it during the code freeze, and a fix to the infinite loops is more priority.
So, summarizing we have 4 options:
1) Going for this MR as it is. 2) Removing (b) alone, with a slight chance that some unknown apps may break. 3) Removing the whole copy_propagation_transform_swizzle(), breaking some tests and more chance of breaking apps. 4) Implementing the "log" proposal.
I would prefer (1) during the freeze and (4) after the freeze, in fact I better start writing (4) now that I have the copy-prop cartridge loaded in my head.
On Mon Nov 20 23:10:39 2023 +0000, Francisco Casas wrote:
I kinda wanted to keep it there to document a simpler problem that may happen. But okay, I squashed it.
Well, it shouldn't happen if we fix the other problem. I.e. it's patching over the problem in the wrong place.
We want (a) at least since there are many cases where we index arrays with swizzles like `arr[a.x]` and we want to avoid relative addressing (for SM1 for instance). Also for instructions that expect constant values like the gather offsets.
But copy_propagation_transform_swizzle() only does something useful when a variable has components from different sources *and* those components aren't all constant. Unless I'm mistaken that is.
Even the example of "arr[a.x]" you give shouldn't actually need copy_propagation_transform_swizzle(). Consider the following:
``` const uint2 indices = {1, 2};
return array[indices.x]; ```
which in IR should be
``` 2: 1 3: 2 4: indices.x = @2 5: indices.y = @3 6: indices 7: @6.x ```
I'd argue this shader is still a little pathological (why would someone define that as a vector?) but even it is covered by current passes. copy_propagation_transform_load() -> copy_propagation_replace_with_constant_vector() should vectorize @6 into a constant array, and then hlsl_fold_constant_swizzles() deals with @7.
b7d34e83077 originated from discussion in 51. I'm not really sure why it was written, but I think it came down to something like "we should vectorize instead of extending copy-prop" followed by "well, that won't handle this case", but the case in question wasn't really realistic.
the only problem I have is that an implementation may be complicated enough for us not to want it during the code freeze, and a fix to the infinite loops is more priority.
So, summarizing we have 4 options:
- Going for this MR as it is.
- Removing (b) alone, with a slight chance that some unknown apps may break.
- Removing the whole copy_propagation_transform_swizzle(), breaking some tests and more chance of breaking apps.
- Implementing the "log" proposal.
I would prefer (1) during the freeze and (4) after the freeze, in fact I better start writing (4) now that I have the copy-prop cartridge loaded in my head.
I'm inclined to disagree honestly. This fix makes me very nervous. I would be a lot less nervous about accepting a proper fix. I also suspect that
Alternatively: When we encounter a load, we record, in the copy_propagation_state, an array of copy_propagation_value's for that load. Then in copy_propagation_transform_swizzle(), instead of calling copy_propagation_get_value(), we use that value array. That way we are looking up the "live" values at the time of the load, rather than later after things may have been invalidated.
Yeah, it feels to me as this is the core problem: the copy propagation algorithm was originally written having only loads in mind, so each time we process a load we have the updated state of variables at that time. When we process a swizzle on a load, the state is that of the swizzle, which might have changed since the load. Therefore, mistakes.
At the same time, keeping all all the past `copy_propagation_value` objects seems a bit inefficient. I wonder if we should do the other way around: each time we see a load we process it with our current knowledge; then, immediately, we look at all instructions using that load and, if there is any swizzle, we record a notice of where the value for that swizzle should truly come from; then we resume processing after the load. Once we arrive to the swizzle we look at all the recorded values and we have an opportunity to change it (either with a constant or with a swizzled load, depending of what we see).
I haven't tried writing this, but superficially it looks like it should work. And I would prefer a proper fix too, rather than something that it is not clear (to me at least) to what extent it should work.
But copy_propagation_transform_swizzle() only does something useful when a variable has components from different sources *and* those components aren't all constant. Unless I'm mistaken that is.
The (a) part of copy_propagation_transform_swizzle() also takes care of replacing with a constant when only the used components of the swizzle's load are constants.
A useful case for (a) is: ``` 1: Q 2: indices = @1 3: 1 4: indices.x = @3 5: indices // <-- is not completely constant (indices.y is Q.y) 6: @5.x // <-- we want to replace it with 1.0 ``` or the tests that start failing once you remove it.
On the other hand I am not sure if (b) is important for any application or test. Removing (b) avoids the infinite loops and no test fails after you remove it, so if this MR (reasonably) makes you nervous, I wouldn't mind on just getting rid of (b) as the quick fix, I have the commit ready https://gitlab.winehq.org/fcasas/vkd3d/-/commit/b7436a1b0aeacab99ca244eb986a... .
But I will try to write a proper fix using your and gio's comments.
But copy_propagation_transform_swizzle() only does something useful when a variable has components from different sources *and* those components aren't all constant. Unless I'm mistaken that is.
The (a) part of copy_propagation_transform_swizzle() also takes care of replacing with a constant when only the used components of the swizzle's load are constants.
That's exactly what I mean. For copy_propagation_transform_swizzle() to do something useful, a variable has to have some constant components and some non-constant components, *and* only the constant components are used. Why would that happen in a real program? Is your IR example actually realistic?
At the same time, keeping all all the past `copy_propagation_value` objects seems a bit inefficient. I wonder if we should do the other way around: each time we see a load we process it with our current knowledge; then, immediately, we look at all instructions using that load and, if there is any swizzle, we record a notice of where the value for that swizzle should truly come from; then we resume processing after the load. Once we arrive to the swizzle we look at all the recorded values and we have an opportunity to change it (either with a constant or with a swizzled load, depending of what we see).
That could also potentially work. We'd have to add more information to hlsl_src, but hlsl_src was built with that kind of eventuality in mind.
On Tue Nov 21 17:55:53 2023 +0000, Zebediah Figura wrote:
At the same time, keeping all all the past `copy_propagation_value`
objects seems a bit inefficient. I wonder if we should do the other way around: each time we see a load we process it with our current knowledge; then, immediately, we look at all instructions using that load and, if there is any swizzle, we record a notice of where the value for that swizzle should truly come from; then we resume processing after the load. Once we arrive to the swizzle we look at all the recorded values and we have an opportunity to change it (either with a constant or with a swizzled load, depending of what we see). That could also potentially work. We'd have to add more information to hlsl_src, but hlsl_src was built with that kind of eventuality in mind.
Though at the same time I dunno that "efficiency" really matters. I don't think it makes sense to prematurely assume that this will be slow, and it probably won't use enough memory to matter.
This merge request was closed by Francisco Casas.
I implemented the idea of keeping the past `copy_propagation_value`s and I think it looks quite nice.
Also I don't think it is inefficient for the following reasons: 1- We can lookup the list of recorded values from newer to older, so in most cases (except swizzles) the lookup will be O(1). 2- The number of entries we will have to store should be less than 4 times the number of program instructions. 3- We can still discard each block's `copy_propagation_state` once we finish processing the block itself, and thus delete all the values recorded in it.
So I am closing this MR in favor of !487.