It would be a bit awkward storing cloned instructions in a temporary array, because we would still need to allocate parameters in the main array, or 'steal' and append them to the parameter allocator's chain.
I did the simplest thing: store index, instance_count and instruction_count for each phase, and memmove chunks starting at the tail. shader_normaliser_init() takes ownership of the passed instruction array, and unwanted instructions are NOP'ed.