Shouldn't the condition be "j < width" rather than "i < width"?
Yes, in fact, I failed to realize that 3 tests were not passing, because I just looked to valgrind's output...
Actually, for that matter, shouldn't "j < width" be an assertion?
No, because we are deliberately passing map_writemask with more than "width" bites on in some places, usually using `VKD3DSP_WRITEMASK_ALL`.