I'm going to have to push back on 2/4. Along the lines of 215, I don't want to spend too much effort doing this on the HLSL level when we really should be doing these kinds of transformations over lower level IR.
I don't like how 4/4 avoids the recursive struct; probably better to do it like vkd3d_shader_register.
More broadly, is there a particular reason we couldn't start using the vkd3d_shader_instruction structures here? I didn't do a comprehensive review, but the sm4_instruction structures largely just look like subsets of the vkd3d_shader_instruction structures, as you'd expect.