I assume you mean the remaining 3 DWORDs. There's a good chance that parameters are internally handled at vec4 granularity (i.e. everything is made of a bunch of 4-element vectors) and individual vec4s are set "atomically".
That's probably correct given what the tests show and MSDN's Remarks section. "all values expected to be matrix4x4s or float4s", "expected to be in column major order", and "ints and floats are cast into float4s."
Could it be that matrices are stored with reversed majority instead (so like in the order ._11, ._12, ._13, ._14, ._21, ... and you're reading them as ._11, ._21, _31, _41, _12, ...)?
What do you mean by reversed majority? Reversed byte order?
The current tests fills the current buffer with 2, writes a 3 with byte_offsets of 0 and 1, then a 2 with byte_offsets of 4, 5, 6, and 7. Then for the next test it writes a 3 with a byte_offset of 8. Which results in the following array values:
{ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 }
{ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3 }
{ (3 << 8) | 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
{ (3 << 8) | 3, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
{ (3 << 8) | 3, 0, 0, 0, 514, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
{ (3 << 8) | 3, 0, 0, 0, 131586, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
{ (3 << 8) | 3, 0, 0, 0, 33686018, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
{ (3 << 8) | 3, 0, 0, 0, 33686018, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0 }
FYI, the current wine implementation fails these tests because for some reason, it refuses to write the first byte of the fifth array index. (So, the first 2 never gets written to array[4].) Still not sure why though...