On my Nvidia GeForce GTX 1050 Ti this test is not passing because of considerably different numeric results.
As Giovanni pointed out, this is because my GPU uses the fine derivate and not the coarse derivate to implement ddx() and ddy().
Testing both ddx_coarse()|ddy_coarse() and ddx_fine()|ddy_fine() on the WARP driver shows that both these derivates are the same in coordinates where both X and Y are even, i.e. the first pixel of each 2x2 quad. So the test was modified to only probe on these coordinates.
The new expected values were obtained from running the test using the WARP driver, and ulps adjusted for my GPU.
However, this MR is marked as a draft because I would like to know if the test passes on other GPUs.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/199
TODO: Fall back to `#undef WITH_SIMD` build if nasm/yasm is not found.
JPEG decoding became much slower after a6ac035a7454c92bec367b2cd3021f8b98d4d807 because Wine no longer used the host libjpeg, which was usually provided by libjpeg-turbo. Use libjpeg-turbo in Wine instead of plain libjpeg.
This adds a dependency on nasm or yasm to compile the i386/x86_64 assembly code.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2956