In the top level pass, every time we encounter a loop, we only unroll one iteration. Record that we have done so in the loop itself, probably by incrementing some counter, or maybe even by decrementing the "unroll_limit" field (not immediately sure if that causes problems). After running an iteration, run DCE, remove_unreachable_code(), etc.
The upshot of this is that we don't have to explicitly check whether we can stop unrolling: if we've unconditionally broken after some number of iterations, then remove_unreachable_code() / DCE will simply remove the loop, and we no longer need to unroll it.
As discussed on IRC, there is a problem with this approach. If the loop contains a (conditional) break, you need to guard out future iterations of that loop with a temporary variable. This means that every iteration of the loop will be inside an if block nested one additional level. That's fine in itself, and it produces correct code, but it runs into the documented maximum of 64 levels of nesting. [It's also mean to our current implementation of copy-prop, although I think we could potentially do things differently.]
For a realistic example, this shader can't be simplified after unrolling, but it should still unroll successfully:
``` Buffer<float> b;
float4 main() : sv_target { [unroll] for (uint i = 0; i < 100; ++i) { if (!b[i]) break; } return i; } ```
(Obviously the "i < 100" part can be simplified, but the conditional break can't.)
What native does in this case is to create a structure like
``` if (!broken) body; if (!broken) body; if (!broken) body; ```
which is well and good, but can't exactly be constructed through incremental and independent lowering like I was proposing. So we will indeed need to construct this manually.
I also realize that incremental lowering can't really work anyway, because loop unrolling can, critically, *fail*, and that should emit a warning and leave the program where it was. (This is also, I assume, why we are cloning the loop before unrolling.)
With all that said, I still don't think we should need to mess with copy prop. What we should be able to do is something more like this:
```c struct hlsl_ir_var *broken = hlsl_new_synthetic_var(); struct hlsl_block output;
hlsl_block_init(&output);
for (;;) { struct hlsl_ir_node *load; struct hlsl_src src;
// Pull out one body, guarded by "!broken". generate_iteration(&output, broken, ...);
load = hlsl_new_var_load(broken); hlsl_block_add_instr(&output, load); hlsl_src_from_node(&src, load);
// Run lowering passes on "output", including copy-prop, trivial branch simplification... ...;
if (src.node->type == HLSL_IR_CONSTANT && /* and it's true... */) { /* successful unroll, break */ }
hlsl_src_remove(&src); } ```