Because of the change introduced in f21693b2, SM1 scalars and vectors were not longer getting the correct writemask when they are allocated, so this is fixed.
Also, the mapping of sm1 src register swizzles is moved outside `write_sm1_instruction()` since there are some instructions that don't do this, remarkably dp2add. This is fixed.
Before the last patch we are writing the operation as: ``` dp2add r0.x, r1.x, r0.x, r2.x ```
and now it is: ``` dp2add r0.x, r1.xyxx, r0.xyxx, r2.x ```
dp2add now has its own function, write_sm1_dp2add(), since it seems to be the only instruction with this structure.
Ideally we would be using the default swizzles for the first two src arguments: ``` dp2add r0.x, r1, r0, r2.x ``` since, according to native's documentation, these are supported for all sm < 4.
But using default swizzles whenever is possible -- along with following the conversion of repeating the last component of the swizzle when fewer than 4 components are to be specified -- has a higher scope. Probably would involve modifying `hlsl_swizzle_from_writemask()` and `hlsl_map_swizzle()`.