2/6 seems suspicious. Why are we running copyprop on out-of-bounds loads if we're going to delete them?
As we've discussed on IRC, since the loop check is stored within the loop body, on the last iteration we possibly run copyprop on OOB accesses.
Can't we DCE them first?
We need to run copyprop and trivial if removal first to simplify stuff like ```hlsl if (!broken) { ++i; if (!(i < 5)) break; /* loop body, potentially containing an OOB access */ } ```
The point of having UNRESOLVED_CONTINUE vs CONTINUE is that the IR should always have uniform semantics. 4/6 is breaking that rule. If we need to do this we should move the _entire_ resolution pass later, and change the jump type when we do.
I'll look into deferring that entire pass then, as it's needed due to:
That said, why do we need 3/6 and 4/6 at all? It seems suspicious.
Consider a loop like: ```hlsl Buffer<float> buf;
float a = 0.0f; for (i = 0; i < 4; ++i) { if (buf[i] < 0.1) break; if (buf[i] < 0.2) continue; a += buf[i]; } ```
without 3/6 and 4/6, this would unroll to something like ```hlsl if (!broken) { continued = false; if (!(i < 4)) broken = true; if (!broken && !continued) { if (buf[i] < 0.1) broken = true; if (!broken && !continued) { if (buf[i] < 0.2) { ++i; continued = true; } if (!broken && !continued) ++i; } } } ```
This is _correct_, but it leads to copy prop invalidating `i` and therefore not propagating it to subsequent iterations. 3/6 and 4/6 allows us to generate this (except for the first iteration): ```hlsl if (!broken) { ++i; /* not present in the first iteration */ continued = false; if (!(i < 4)) broken = true; if (!broken && !continued) { if (buf[i] < 0.1) broken = true; if (!broken && !continued) { if (buf[i] < 0.2) { continued = true; } if (!broken && !continued) { a += buf[i]; } } } } ```
In 5/6 please separate a patch to move evaluate_static_expression_as_uint() upwards.
Shall do.
Why loop_unrolling_find_unrollable_loop()? Usually we just iterate over all instructions, why are we not doing that here?
Because we need to clone the blocks every time we attempt an unroll: as soon as an unroll is successful we have to clone the result as future unrolls might fail, so looping through the entire block once isn't feasible without poking with `clone_block`'s `struct clone_instr_map` to remap `loop->next` from the original block to the cloned one.
Why are we cloning the loop before unrolling it?
loop unrolling can, critically, _fail_, and that should emit a warning and leave the program where it was. (This is also, I assume, why we are cloning the loop before unrolling.)
Yep, but also, this MR's current handling of unrolling failure isn't 100% accurate to native. Native will unroll by default, even without the attribute present, but if the attribute is present and unrolling fails that should trigger an actual error. I'll look into fixing this.
With all that said, I still don't think we should need to mess with copy prop. What we should be able to do is something more like this: [code snippet]
This is close to what this PR does, but it would run into the variable invalidation issue I mentioned above, and would also have a noticeable performance impact as copy prop would have to visit n² `if !broken`s.