(1) Q: Why does this only affect object loads? A: Because only object loads grab the deref directly instead of just referring to the LOAD instruction. Q: But don't regular loads have the same problem? A: No, because a sequence like
2: var 3: var = @2 4: var
gets turned into
... 4: @2 (modulo replicate swizzle, which I think we have a pass to remove), and we can't ever turn that *back* into "4: var" because var was written in the previous instruction.
Yes, regular loads only return "true" if the instruction itself is removed and srcs to it are replaced, and since the number of instructions is finite, this can only happen a finite number of times. The swizzle created by copy_propagation_transform_load() is indeed removed by the remove_trivial_swizzles() pass, if it is trivial.
In theory the same restriction *should* be what's preventing object loads from being affected, but that's not really feasible, is it...
Object loads don't replace the instruction itself and instead just seek for replacements for their inner derefs, and sometimes these replacement derefs can be the same deref (albeit, naturally coming from a different instruction).
Still, we should have tests for that case. We should also have tests for assignment-to-self where swizzles are involved, probably including both swizzles that have "loops" (e.g. "var.xy = var.xz") and those that don't (e.g. "var.xy = var.yx").
Okay, I will add some on another branch where I am working with swizzles.
I also feel like there's some way that we should be able to remember why this only matters for object loads. Maybe we can write a code comment to the effect of the above. But it's also possible that we should be handling this not in copy_propagation_transform_object_load() but rather in copy_propagation_record_store(), even though that requires peeking at the RHS. Not sure about that one.
Okay, I will add a comment. Hmm, I can't think of a simple way of solving this in copy_propagation_record_store(), but maybe I haven't thought it enough.
(2) We're going to need to get rid of these assignments anyway, though. Consider the last test, which still fails—I assume because we're left with object load/stores in the IR and we can't translate those to sm4.
The endless loop seems to happen when there are uninitialized objects such as in the first test (or in the second test for now, because we are lacking more support for object components) because copy-prop indeed can't get rid of these assignments.
If a shader ends up with references to uninitialized objects it should not compile, but we can only check for those after copy propagation (or we will get false positives), in the `validate_static_object_references` pass.
The last test fails for now because it also requires more support for object components; when this patch is added on top of my master6 branch the test passes.
(3) Why does that test require sm5? For that matter, can't we write it so that it uses a sample rather than a load, and that way it can work on sm1 as well.
Only sm >= 5 supports objects as struct members, so I decided to use Load to save it from defining a sampler.