On Wed Nov 23 15:58:28 2022 +0000, Francisco Casas wrote:
Ok, so, in summary, I identify the following relevant passes:
- **a)** Constant handling in copy-prop (basically, what this first
patch and Giovanni's original implementation do).
- **b)** vectorization of stores with rhs constants.
- **c.1)** vectorization of stores with rhs loads from the same vector
within a variable.
- **c.2)** alternatively, vectorization of stores with rhs loads from
the same register.
- **d.1)** split of stores with rhs loads in copy-prop, grouping the
components according to the instruction node of their copy_propagation_value.
- **d.2)** alternatively, split of stores with rhs loads componet-wise
(it seems that this doesn't need to be during copy-prop). I decided to separate (c) into (c.1) and (c.2) because the latter would be harder to implement with the current architecture but have more reach. Also, it may be more convenient to do it after register allocation. I think Zeb's "generic vectorization of stores" maps to (b), although before her last message I thought she meant a pass that does both (b) and (c) because it had "generic" in the name. Probably Gio thought the same. Also, I think both (d.1) and (d.2) map to Zeb's "propagation of load+store sequences into multiple store instructions" proposals. So, the idea is to achieve all the functionality of (a) and also achieve additional vectorization through the implementation of just (b) and (d). I will start writing my thoughts on these alternatives.
### Regarding (b) and (c), i.e. vectorization:
To achieve vectorization, in particular (c) for, say:
``` a.x = b.x; // other instructions a.y = b.y ```
If we merge the two operations down, we have to make sure that: - `a.x` is not read/write by the other instructions in between. - `b.x` is not written in between.
If we merge the two operations up, we have to make sure that: - `a.y` is not read/write by the other instructions in between. - `b.y` is not written in between, and its hlsl_ir_load is before the `a.x =` instruction, or can be moved there.
Were we also have to consider the possibility of non-constant paths accesing these values.
(b) would be easier to achieve than (c) since, if we replace `b.x` and `b.y` with constants in this example, we know that these values will never be written to. Also, (c) is more complex since we also have to keep an eye on the location of the `b.x` and `b.y` hlsl_ir_load·s in the IR.
Since this pass wouldn't be helping copy-prop, it probably makes sense to run it after the `do ... while(progress)` that includes copy-prop and friends.
To implement this, we can write a function to check whether is possible for a value to be accessed for read/write within a list of instructions, giving an starting point and an end point, and use it to check if these conditions meet for each pair of instructions that are compatible (both are stores from `b` to `a` in different components).
However, to have the same reach as copy-prop, we would also have to consider how to handle control flow with this pass. There may be ifs or loops in between, or one of the instructions may be inside a block where the other is not. Since I don't have clear answers on how to implement the latter cases, I assume (b) and (c) would only operate for pairs of instructions in the same block.