copy_propagation_transform_object_load() currently retrieves true, which indicates that there was progress, even if the input dereference remains the same.
This can happen repeatedly if an uninitialized object is copied onto itself, as in the tests added in this patch. This results on the compilation getting stuck on an endless loop.
This patch checks if the deref didn't change, to retrieve false in that case.
-- v2: vkd3d-shader/hlsl: Avoid infinite loop in copy propagation.
From: Francisco Casas fcasas@codeweavers.com
copy_propagation_transform_object_load() currently retrieves true, which indicates that there was progress, even if the input dereference remains the same.
This can happen repeatedly if an uninitialized object is copied onto itself, as in the tests added in this patch. This results on the compilation getting stuck on an endless loop.
This patch checks if the deref didn't change, to retrieve false in that case. --- libs/vkd3d-shader/hlsl.c | 27 +++++++++++++++++++++ libs/vkd3d-shader/hlsl.h | 2 ++ libs/vkd3d-shader/hlsl_codegen.c | 3 +++ tests/object-references.shader_test | 37 +++++++++++++++++++++++++++++ 4 files changed, 69 insertions(+)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index 8aa289ac..627381bb 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -846,6 +846,33 @@ void hlsl_cleanup_deref(struct hlsl_deref *deref) hlsl_src_remove(&deref->offset); }
+bool hlsl_derefs_are_equal(struct hlsl_ctx *ctx, const struct hlsl_deref *deref1, + const struct hlsl_deref *deref2) +{ + unsigned int i; + + assert(!deref1->offset.node); + assert(!deref2->offset.node); + + if (deref1->var != deref2->var) + return false; + if (deref1->path_len != deref2->path_len) + return false; + + for (i = 0; i < deref1->path_len; ++i) + { + const struct hlsl_ir_node *node1 = deref1->path[i].node, *node2 = deref2->path[i].node; + + if (node1 == node2) + continue; + if (node1->type != HLSL_IR_CONSTANT || node2->type != HLSL_IR_CONSTANT) + return false; + if (hlsl_ir_constant(node1)->value[0].u != hlsl_ir_constant(node2)->value[0].u) + return false; + } + return true; +} + /* Initializes a simple variable derefence, so that it can be passed to load/store functions. */ void hlsl_init_simple_deref_from_var(struct hlsl_deref *deref, struct hlsl_ir_var *var) { diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index b6a593ca..3d3825c2 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -752,6 +752,8 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry
bool hlsl_copy_deref(struct hlsl_ctx *ctx, struct hlsl_deref *deref, const struct hlsl_deref *other); void hlsl_cleanup_deref(struct hlsl_deref *deref); +bool hlsl_derefs_are_equal(struct hlsl_ctx *ctx, const struct hlsl_deref *deref1, + const struct hlsl_deref *deref2);
void hlsl_replace_node(struct hlsl_ir_node *old, struct hlsl_ir_node *new);
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 6e4168fc..bd5a2c5a 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -773,6 +773,9 @@ static bool copy_propagation_transform_object_load(struct hlsl_ctx *ctx, /* Only HLSL_IR_LOAD can produce an object. */ load = hlsl_ir_load(instr);
+ if (hlsl_derefs_are_equal(ctx, deref, &load->src)) + return false; + hlsl_cleanup_deref(deref); hlsl_copy_deref(ctx, deref, &load->src);
diff --git a/tests/object-references.shader_test b/tests/object-references.shader_test index 12f745e6..78fabd79 100644 --- a/tests/object-references.shader_test +++ b/tests/object-references.shader_test @@ -1,3 +1,15 @@ +[pixel shader fail todo] +sampler sam; + +float4 main() : sv_target +{ + Texture2D tex; + + tex = tex; // Uninitialized assignment to self. + return tex.Sample(sam, float2(0, 0)); +} + + [require] shader model >= 4.0
@@ -155,3 +167,28 @@ float4 main() : sv_target uniform 0 float 10.0 todo draw quad todo probe (0, 0) rgba (11.0, 12.0, 13.0, 11.0) + + +[require] +shader model >= 5.0 + + +[texture 0] +size (1, 1) +1.0 2.0 3.0 4.0 + +[pixel shader todo] +struct apple { + Texture2D tex; + float4 fo : COLOR; +}; + +float4 main(struct apple input) : sv_target +{ + input.tex = input.tex; // Assignment to self. + return input.tex.Load(int3(0, 0, 0)); +} + +[test] +todo draw quad +todo probe (0, 0) rgba (1.0, 2.0, 3.0, 4.0)
I have several immediate thoughts on this:
(1) Q: Why does this only affect object loads? A: Because only object loads grab the deref directly instead of just referring to the LOAD instruction. Q: But don't regular loads have the same problem? A: No, because a sequence like
2: var 3: var = @2 4: var
gets turned into
... 4: @2
(modulo replicate swizzle, which I think we have a pass to remove), and we can't ever turn that *back* into "4: var" because var was written in the previous instruction.
In theory the same restriction *should* be what's preventing object loads from being affected, but that's not really feasible, is it...
Still, we should have tests for that case. We should also have tests for assignment-to-self where swizzles are involved, probably including both swizzles that have "loops" (e.g. "var.xy = var.xz") and those that don't (e.g. "var.xy = var.yx").
I also feel like there's some way that we should be able to remember why this only matters for object loads. Maybe we can write a code comment to the effect of the above. But it's also possible that we should be handling this not in copy_propagation_transform_object_load() but rather in copy_propagation_record_store(), even though that requires peeking at the RHS. Not sure about that one.
(2) We're going to need to get rid of these assignments anyway, though. Consider the last test, which still fails—I assume because we're left with object load/stores in the IR and we can't translate those to sm4.
So I'd advocate for doing this in a pass before copy-prop. We don't actually need copy-prop to achieve it, we just want to look for stores whose rhs is a load from the same variable.
We could leave an "assert(!hlsl_derefs_are_equal())" in copy-prop though.
(3) Why does that test require sm5? For that matter, can't we write it so that it uses a sample rather than a load, and that way it can work on sm1 as well.
I don't think that checking for variable identity is the right thing to do, it feels more a workaround for the symptom. The underlying issue seems to me that object variables actually come in two types: the real uniform object, that is split in a dedicated pass, and all the other object variables that behave rather as pointers to the real objects. The HLSL language is rather terrible when it conflates these two concepts, which is the reason why we need the splitting pass. Here we just have to acknowledge the same distinction: it makes sense to propagate a load from an object variable when that variable is a real object, i.e., a uniform variable.
In other words, I'd rather fix this bug using: ```c if (!load->src.var->is_uniform) { TRACE("Ignoring load from non-uniform object variable %s\n", load->src.var->name); return false; } ``` in `copy_propagation_transform_object_load()`, instead of checking for variable identity.
(1) Q: Why does this only affect object loads? A: Because only object loads grab the deref directly instead of just referring to the LOAD instruction. Q: But don't regular loads have the same problem? A: No, because a sequence like
2: var 3: var = @2 4: var
gets turned into
... 4: @2 (modulo replicate swizzle, which I think we have a pass to remove), and we can't ever turn that *back* into "4: var" because var was written in the previous instruction.
Yes, regular loads only return "true" if the instruction itself is removed and srcs to it are replaced, and since the number of instructions is finite, this can only happen a finite number of times. The swizzle created by copy_propagation_transform_load() is indeed removed by the remove_trivial_swizzles() pass, if it is trivial.
In theory the same restriction *should* be what's preventing object loads from being affected, but that's not really feasible, is it...
Object loads don't replace the instruction itself and instead just seek for replacements for their inner derefs, and sometimes these replacement derefs can be the same deref (albeit, naturally coming from a different instruction).
Still, we should have tests for that case. We should also have tests for assignment-to-self where swizzles are involved, probably including both swizzles that have "loops" (e.g. "var.xy = var.xz") and those that don't (e.g. "var.xy = var.yx").
Okay, I will add some on another branch where I am working with swizzles.
I also feel like there's some way that we should be able to remember why this only matters for object loads. Maybe we can write a code comment to the effect of the above. But it's also possible that we should be handling this not in copy_propagation_transform_object_load() but rather in copy_propagation_record_store(), even though that requires peeking at the RHS. Not sure about that one.
Okay, I will add a comment. Hmm, I can't think of a simple way of solving this in copy_propagation_record_store(), but maybe I haven't thought it enough.
(2) We're going to need to get rid of these assignments anyway, though. Consider the last test, which still fails—I assume because we're left with object load/stores in the IR and we can't translate those to sm4.
The endless loop seems to happen when there are uninitialized objects such as in the first test (or in the second test for now, because we are lacking more support for object components) because copy-prop indeed can't get rid of these assignments.
If a shader ends up with references to uninitialized objects it should not compile, but we can only check for those after copy propagation (or we will get false positives), in the `validate_static_object_references` pass.
The last test fails for now because it also requires more support for object components; when this patch is added on top of my master6 branch the test passes.
(3) Why does that test require sm5? For that matter, can't we write it so that it uses a sample rather than a load, and that way it can work on sm1 as well.
Only sm >= 5 supports objects as struct members, so I decided to use Load to save it from defining a sampler.
On Thu Jan 5 17:54:45 2023 +0000, Francisco Casas wrote:
(1) Q: Why does this only affect object loads? A: Because only object
loads grab the deref directly instead of just referring to the LOAD instruction. Q: But don't regular loads have the same problem? A: No, because a sequence like
2: var 3: var = @2 4: var
gets turned into
... 4: @2 (modulo replicate swizzle, which I think we have a pass to remove),
and we can't ever turn that *back* into "4: var" because var was written in the previous instruction.
Yes, regular loads only return "true" if the instruction itself is removed and srcs to it are replaced, and since the number of instructions is finite, this can only happen a finite number of times. The swizzle created by copy_propagation_transform_load() is indeed removed by the remove_trivial_swizzles() pass, if it is trivial.
In theory the same restriction *should* be what's preventing object
loads from being affected, but that's not really feasible, is it... Object loads don't replace the instruction itself and instead just seek for replacements for their inner derefs, and sometimes these replacement derefs can be the same deref (albeit, naturally coming from a different instruction).
Still, we should have tests for that case. We should also have tests
for assignment-to-self where swizzles are involved, probably including both swizzles that have "loops" (e.g. "var.xy = var.xz") and those that don't (e.g. "var.xy = var.yx"). Okay, I will add some on another branch where I am working with swizzles.
I also feel like there's some way that we should be able to remember
why this only matters for object loads. Maybe we can write a code comment to the effect of the above. But it's also possible that we should be handling this not in copy_propagation_transform_object_load() but rather in copy_propagation_record_store(), even though that requires peeking at the RHS. Not sure about that one. Okay, I will add a comment. Hmm, I can't think of a simple way of solving this in copy_propagation_record_store(), but maybe I haven't thought it enough.
(2) We're going to need to get rid of these assignments anyway,
though. Consider the last test, which still fails—I assume because we're left with object load/stores in the IR and we can't translate those to sm4. The endless loop seems to happen when there are uninitialized objects such as in the first test (or in the second test for now, because we are lacking more support for object components) because copy-prop indeed can't get rid of these assignments. If a shader ends up with references to uninitialized objects it should not compile, but we can only check for those after copy propagation (or we will get false positives), in the `validate_static_object_references` pass. The last test fails for now because it also requires more support for object components; when this patch is added on top of my master6 branch the test passes.
(3) Why does that test require sm5? For that matter, can't we write it
so that it uses a sample rather than a load, and that way it can work on sm1 as well. Only sm >= 5 supports objects as struct members, so I decided to use Load to save it from defining a sampler.
Oh sorry, I skipped this part:
So I'd advocate for doing this in a pass before copy-prop. We don't actually need copy-prop to achieve it, we just want to look for stores whose rhs is a load from the same variable.
Yes, but I am not sure yet if self-assignments are the only way in which the derefs can be equal. The solution proposed by Giovanni could also do the trick if non-static object references are also a requirement for this problem.
We could leave an "assert(!hlsl_derefs_are_equal())" in copy-prop though.
Yes, it would make sense to do this if we use that approach.
On Thu Jan 5 18:57:18 2023 +0000, Giovanni Mascellani wrote:
I don't think that checking for variable identity is the right thing to do, it feels more a workaround for the symptom. The underlying issue seems to me that object variables actually come in two types: the real uniform object, that is split in a dedicated pass, and all the other object variables that behave rather as pointers to the real objects. The HLSL language is rather terrible when it conflates these two concepts, which is the reason why we need the splitting pass. Here we just have to acknowledge the same distinction: it makes sense to propagate a load from an object variable when that variable is a real object, i.e., a uniform variable. In other words, I'd rather fix this bug using:
if (!load->src.var->is_uniform) { TRACE("Ignoring load from non-uniform object variable %s\n", load->src.var->name); return false; }
in `copy_propagation_transform_object_load()`, instead of checking for variable identity.
After giving it some thought, I think this is an elegant solution!
On Thu Jan 5 18:57:18 2023 +0000, Francisco Casas wrote:
After giving it some thought, I think this is an elegant solution!
This made me realize we probably want to check for the variables being uniform in `validate_static_object_references()`, so that the first test passes and we don't reach: `Aborting due to not yet implemented feature: Load from non-uniform resource variable.` or `Aborting due to not yet implemented feature: Object copy.` and the first tests fails correctly with an `hlsl_error()` instead of an `hlsl_fixme()`.