I haven't reviewed in detail, but I have some high level comments:
* 2/6 seems suspicious. Why are we running copyprop on out-of-bounds loads if we're going to delete them? Can't we DCE them first?
* The point of having UNRESOLVED_CONTINUE vs CONTINUE is that the IR should always have uniform semantics. 4/6 is breaking that rule. If we need to do this we should move the *entire* resolution pass later, and change the jump type when we do.
* That said, why do we need 3/6 and 4/6 at all? It seems suspicious. After all, a loop like
for (i = 0; i < n; ++i) { ...; }
is isomorphic to
i = 0; for (;;) { if (!(i < n)) break; ...; ++i; }
and we should be able to unroll both of them equally well.
* In 5/6 please separate a patch to move evaluate_static_expression_as_uint() upwards.
* Why loop_unrolling_find_unrollable_loop()? Usually we just iterate over all instructions, why are we not doing that here?
* Why are we cloning the loop before unrolling it?
* Saving and restoring the copy propagation state works, but it's a bit unfortunate, and reaching into the copyprop internals seems scary and is not something I really want to have to think through.
Instead here's an alternate approach. I haven't tried to implement it but I think it should work:
In the top level pass, every time we encounter a loop, we only unroll one iteration. Record that we have done so in the loop itself, probably by incrementing some counter, or maybe even by decrementing the "unroll_limit" field (not immediately sure if that causes problems). After running an iteration, run DCE, remove_unreachable_code(), etc.
The upshot of this is that we don't have to explicitly check whether we can stop unrolling: if we've unconditionally broken after some number of iterations, then remove_unreachable_code() / DCE will simply remove the loop, and we no longer need to unroll it.