Actually I think it looks pretty reasonable :-) I'm mildly surprised that the unallocated ones don't try to go back and fill in the gaps, but that does make our job easier.
They go back and try to fill the gaps! I just failed to see that calculate_buffer_offset() doesn't fill the gaps like allocate_numeric_registers_for_type() does.
In my test ``` float4 a[3]; // will get register(c3) float4 b[2] : register(c1); ``` `a` cannot fill the gap because it is too big.
So it is good to add your test. And this patch requires more work. I will see if it makes sense to use `struct register_allocator` here. Thanks!