I discovered a few problems with my implementation. Unfortunately the assumption that each component of a register is 32 bit is baked in quite a few places (for example, using an IDXTEMP of type double with trigger an assertion; and in some cases writing to a TEMP of type double2 generates bad code).