On my Nvidia GeForce GTX 1050 Ti this test is not passing because of considerably different numeric results.
As Giovanni pointed out, this is because my GPU uses the fine derivate and not the coarse derivate to implement ddx() and ddy().
Testing both ddx_coarse()|ddy_coarse() and ddx_fine()|ddy_fine() on the WARP driver shows that both these derivates are the same in coordinates where both X and Y are even, i.e. the first pixel of each 2x2 quad. So the test was modified to only probe on these coordinates.
The new expected values were obtained from running the test using the WARP driver, and ulps adjusted for my GPU. However, this MR is marked as a draft because I would like to know if the test passes on other GPUs.
From: Francisco Casas fcasas@codeweavers.com
Besides testing both types of derivates, this test is to ensure that for the pixels in coordinates where both X and Y are even, the fine and coarse derivates are equal.
While this is not enforced by the language specification, testing shows it is the case for the WARP driver, and seems to be the case for different GPUs in practice.
If this holds true for all relevant GPUs, it is useful because it allows us to normalize the ddx() and ddy() tests, regardless of whether these functions are implemented as fine or coarse derivates. --- tests/ddxddy.shader_test | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/tests/ddxddy.shader_test b/tests/ddxddy.shader_test index 6efb5ab6..3c840f57 100644 --- a/tests/ddxddy.shader_test +++ b/tests/ddxddy.shader_test @@ -24,3 +24,25 @@ probe (10, 11) rgba (-0.420000076, -0.164999843, 0.104999900, 0.0) 8 probe (11, 11) rgba (-0.574999928, -0.164999843, 0.104999900, 0.0) 8 probe (12, 10) rgba (-0.874999881, -0.205000162, 0.124999881, 0.0) 8 probe (150, 150) rgba (-7.52500916, -1.56500244, 1.50500488, 0.0) 40 + + +[require] +shader model >= 5.0 + + +[pixel shader todo] +float4 main(float4 pos : sv_position) : sv_target +{ + pos /= 10.0; + float nonlinear = pos.x * pos.y - pos.x * (pos.x + 0.5); + return float4(nonlinear, 10.0 + ddx_fine(nonlinear) - ddx_coarse(nonlinear), + 10.0 + ddy_fine(nonlinear) - ddy_coarse(nonlinear), 0.0); +} + +[test] +todo draw quad +probe (10, 10) rgba (-0.524999976, 10.0, 10.0, 0.0) +probe (11, 10) rgba (-0.689999819, 10.0, 10.0100002, 0.0) +probe (10, 11) rgba (-0.420000076, 10.0100002, 10.0, 0.0) +probe (11, 11) rgba (-0.574999928, 10.0100002, 10.0100002, 0.0) +probe (12, 12) rgba (-0.625000000, 10.0, 10.0, 0.0)
From: Francisco Casas fcasas@codeweavers.com
On my Nvidia GeForce GTX 1050 Ti this test is not passing because ddx() and ddy() are implemented using the fine derivate and not the coarse derivate, resulting in considerably different numeric results.
To make results equal for both possible implementations, only pixels on even X and Y coordinates are probed.
The new expected values were obtained from running the test using the WARP driver, and ulps adjusted.
The minimum shader model for ddx() and ddy() is 2.1, so the [require] directive was added to properly run the cross tests on Windows. --- tests/ddxddy.shader_test | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/tests/ddxddy.shader_test b/tests/ddxddy.shader_test index 3c840f57..73429fdb 100644 --- a/tests/ddxddy.shader_test +++ b/tests/ddxddy.shader_test @@ -1,3 +1,8 @@ +% NOTE: Only allowed on shader model ps_2_1 or better. +[require] +shader model >= 4.0 + + [pixel shader] float4 main(float4 pos : sv_position) : sv_target { @@ -8,6 +13,13 @@ float4 main(float4 pos : sv_position) : sv_target draw quad probe all rgba (1.0, 1.0, 0.0, 0.0)
+ +% NOTE: For each pixel, the derivates are computed using only values from the 2x2 quad where +% the pixel is located. +% Both how the derivates are calculated, and how the quads are arranged is implementation dependent. +% However, the observed behavior in tested GPUs and the WARP driver is that ddx() and ddy() use +% either coarse or fine derivates, and that the coarse derivate is the same as the fine derivate +% for the 1st pixel of each 2x2 quad, so we probe in even coordinates. [pixel shader] float4 main(float4 pos : sv_position) : sv_target { @@ -18,11 +30,11 @@ float4 main(float4 pos : sv_position) : sv_target
[test] draw quad -probe (10, 10) rgba (-0.524999976, -0.164999843, 0.104999900, 0.0) 8 -probe (11, 10) rgba (-0.689999819, -0.164999843, 0.104999900, 0.0) 8 -probe (10, 11) rgba (-0.420000076, -0.164999843, 0.104999900, 0.0) 8 -probe (11, 11) rgba (-0.574999928, -0.164999843, 0.104999900, 0.0) 8 -probe (12, 10) rgba (-0.874999881, -0.205000162, 0.124999881, 0.0) 8 +probe (10, 10) rgba (-0.524999976, -0.164999843, 0.104999900, 0.0) 12 +probe (12, 10) rgba (-0.874999881, -0.205000162, 0.124999881, 0.0) 20 +probe (10, 12) rgba (-0.315000057, -0.144999862, 0.105000019, 0.0) 8 +probe (12, 12) rgba (-0.625000000, -0.185000181, 0.125000000, 0.0) 8 +probe (14, 10) rgba (-1.30499995, -0.245000362, 0.144999862, 0.0) 8 probe (150, 150) rgba (-7.52500916, -1.56500244, 1.50500488, 0.0) 40
My feeling is that these tests are more obscure than they need to be. I'd rather have a test for each of the derivative variants (fine, coarse and "unspecified"), and test all four the possible positions in a 2x2 tile in each of the tests. If the results vary wildly, use quantization. The point of these tests, I think, is not checking how precisely the video card is doing derivatives, but rather check that (in both the HLSL compiler and SMx -> SPIR-V compiler) we're correctly passing derivation operations over.
On Mon Jun 5 21:41:56 2023 +0000, Giovanni Mascellani wrote:
My feeling is that these tests are more obscure than they need to be. I'd rather have a test for each of the derivative variants (fine, coarse and "unspecified"), and test all four the possible positions in a 2x2 tile in each of the tests. If the results vary wildly, use quantization. The point of these tests, I think, is not checking how precisely the video card is doing derivatives, but rather check that (in both the HLSL compiler and SMx -> SPIR-V compiler) we're correctly passing derivation operations over.
Got it. I did just that in !224.
Superseded by !224 .
This merge request was closed by Francisco Casas.