The idea is, if you have something like that float4(f, 1, 2, 3) sequence, you can't copy-prop loads from `<constructor-2>` with the current pass because they don't all have the same instruction as a source. So what you do instead is, any time that `<constructor-2>` is itself used as an rhs to a store, you split up that store into multiple stores. You could say one per unique source but you could also do it more simply by saying one per component (and then letting vectorization clean that up later). That way you're guaranteed to be able to copy-prop them. You increase the number of stores but (hopefully) reduce the number of intermediate variables.
My way to read your proposal (for the second step) is as a sort of *variable deduplication* (or maybe *dealiasing*?). What's happening in your test program is that `<constructor-2>` and `<output-sv_target0>` are essentially the same variable: as soon as they both are initialized they have the same value, and they will keep it for their whole lifespan. So you can basically replace one with the other one. In theory you can do it either way, but in this case you need `<output-sv_target0>` to survive because it has an externally visible semantic.