### Regarding (c.1) vs (c.2)
We have the following dilemma:
We can choose between vectorizing at the vector level, or try to do it at the register level. I think the difference only becomes apparent when more than one vector/scalar share the same register.
On one hand, vectorizing register-wise is more general and is a better optimization. On the other, it is more complicated as we either should be considering register offsets at the IR level when running this passes, identifying in the struct if two components share the same register, or we should be derefering the pass after register allocation.
So far, I don't think we need (c) for some reason other than optimizing the output code and making the IR more compact, but it is worth thinking about it.
Actually, (b) is also not excempt of this dilemma, since rhs could also vectorized vector-wise or register-wise. (maybe we should call them (b.1) and (b.2)).