With tests from !364, separated out from the HLSL changes there and updated. This MR can wait until 364 is upstream though.
It is apparently unnecessary to match the SM4/5 implementation, since the AMD Windows results differ. The RADV results are a bit wrong, but Proton uses the SPIR-V GLSL extension instructions too, and no workarounds have been implemented there.