I fear this isn't going to be enough. E.g. does this handle loops correctly?
I don't see a reason why it shouldn't cover loops, since copy_propagation_invalidate_from_block() is called right at the beginning of copy_propagation_process_loop(), so any values that are written after the swizzle, in the loop, should be invalidated.
Even if it is enough I feel like the logic is fragile enough that we should find some other way.
Should we throw out this pass altogether? We have tests that depend on it, but I think those tests are probably very artificial.
I don't think we want to get rid of copy_propagation_transform_swizzle() as a whole. We can see it as composed of two parts:
a) copy_propagation_replace_with_constant_vector() b) copy_propagation_replace_with_single_instr()
We want (a) at least since there are many cases where we index arrays with swizzles like `arr[a.x]` and we want to avoid relative addressing (for SM1 for instance). Also for instructions that expect constant values like the gather offsets.
However, we could argue to get rid of the (b) part. No test failures appear after we remove it.
I still fear that (b) may be required to make (a) work in some cases such as: ``` @1: Q @2: A.x = 1.0 @3: A @4: B = @3 @5: B.y = @1 @6: B @7: @6.x // <-- we want to turn this into a 1.0 constant ```
Alternatively: When we encounter a load, we record, in the copy_propagation_state, an array of copy_propagation_value's for that load. Then in copy_propagation_transform_swizzle(), instead of calling copy_propagation_get_value(), we use that value array. That way we are looking up the "live" values at the time of the load, rather than later after things may have been invalidated.
I agree that this approach of storing the "log" (if you may) for each value will probably be required to solve cases such as: ``` @1: Q @2: B.x = 1.0 @3: B @4: B.x = @1 @5: @3.x // <-- we want to turn this into a 1.0 constant ```
the only problem I have is that an implementation may be complicated enough for us not to want it during the code freeze, and a fix to the infinite loops is more priority.
So, summarizing we have 4 options:
1) Going for this MR as it is. 2) Removing (b) alone, with a slight chance that some unknown apps may break. 3) Removing the whole copy_propagation_transform_swizzle(), breaking some tests and more chance of breaking apps. 4) Implementing the "log" proposal.
I would prefer (1) during the freeze and (4) after the freeze, in fact I better start writing (4) now that I have the copy-prop cartridge loaded in my head.