On Thu Jan 5 17:54:45 2023 +0000, Francisco Casas wrote:
(1) Q: Why does this only affect object loads? A: Because only object
loads grab the deref directly instead of just referring to the LOAD instruction. Q: But don't regular loads have the same problem? A: No, because a sequence like
2: var 3: var = @2 4: var
gets turned into
... 4: @2 (modulo replicate swizzle, which I think we have a pass to remove),
and we can't ever turn that *back* into "4: var" because var was written in the previous instruction.
Yes, regular loads only return "true" if the instruction itself is removed and srcs to it are replaced, and since the number of instructions is finite, this can only happen a finite number of times. The swizzle created by copy_propagation_transform_load() is indeed removed by the remove_trivial_swizzles() pass, if it is trivial.
In theory the same restriction *should* be what's preventing object
loads from being affected, but that's not really feasible, is it... Object loads don't replace the instruction itself and instead just seek for replacements for their inner derefs, and sometimes these replacement derefs can be the same deref (albeit, naturally coming from a different instruction).
Still, we should have tests for that case. We should also have tests
for assignment-to-self where swizzles are involved, probably including both swizzles that have "loops" (e.g. "var.xy = var.xz") and those that don't (e.g. "var.xy = var.yx"). Okay, I will add some on another branch where I am working with swizzles.
I also feel like there's some way that we should be able to remember
why this only matters for object loads. Maybe we can write a code comment to the effect of the above. But it's also possible that we should be handling this not in copy_propagation_transform_object_load() but rather in copy_propagation_record_store(), even though that requires peeking at the RHS. Not sure about that one. Okay, I will add a comment. Hmm, I can't think of a simple way of solving this in copy_propagation_record_store(), but maybe I haven't thought it enough.
(2) We're going to need to get rid of these assignments anyway,
though. Consider the last test, which still fails—I assume because we're left with object load/stores in the IR and we can't translate those to sm4. The endless loop seems to happen when there are uninitialized objects such as in the first test (or in the second test for now, because we are lacking more support for object components) because copy-prop indeed can't get rid of these assignments. If a shader ends up with references to uninitialized objects it should not compile, but we can only check for those after copy propagation (or we will get false positives), in the `validate_static_object_references` pass. The last test fails for now because it also requires more support for object components; when this patch is added on top of my master6 branch the test passes.
(3) Why does that test require sm5? For that matter, can't we write it
so that it uses a sample rather than a load, and that way it can work on sm1 as well. Only sm >= 5 supports objects as struct members, so I decided to use Load to save it from defining a sampler.
Oh sorry, I skipped this part:
So I'd advocate for doing this in a pass before copy-prop. We don't actually need copy-prop to achieve it, we just want to look for stores whose rhs is a load from the same variable.
Yes, but I am not sure yet if self-assignments are the only way in which the derefs can be equal. The solution proposed by Giovanni could also do the trick if non-static object references are also a requirement for this problem.
We could leave an "assert(!hlsl_derefs_are_equal())" in copy-prop though.
Yes, it would make sense to do this if we use that approach.