New approach: use the dst write mask for instructions for which the write mask is correct. All others use a fallback path.
However, the swizzle should not have a component unused by the write mask, e.g. for write mask `6` the swizzle will be something like `.yyzy`. The DXIL parser is coded to follow this pattern too, but we don't validate it for TPF. If we can rely on this pattern, deduplication with `handled_mask` means we can always use `VKD3DSP_WRITEMASK_ALL` and do without any instruction handling.