I still have in my backlog to introduce vsir between the HLSL->d3dbc translation. I think it may be preferable to write these transformations in a d3dbc-specific vsir pass, which can gradually start absorbing those currently done in HLSL IR (looking at lower_nonfloat_exprs()), than to translate one IR instruction into multiple bytecode instructions, which requires the scaffolding for using additional temporary registers in some cases.
Of course we want that anyway, but to be clear, it doesn't need to block getting rid of these passes either. E.g. what I want it something like the attached diff.
[scratch.diff](/uploads/13dbacdb5b8497b49f308e6d3b56c42c/scratch.diff)