---
See the added comment for details what is going on.
-- v3: d3d9/tests: Wait longer in test_occlusion_query for software renderers. d3d9/tests: The device window may restore behind our back in test_wndproc. d3d9/tests: Work around test_reset_fullscreen failing on gitlab CI.
From: Stefan Dösinger stefan@codeweavers.com
---
See the added comment for details what is going on. --- dlls/d3d9/tests/device.c | 68 ++++++++++++++++++++++++++++------------ 1 file changed, 48 insertions(+), 20 deletions(-)
diff --git a/dlls/d3d9/tests/device.c b/dlls/d3d9/tests/device.c index 8ab52c7d5e4..0d38f68bf3f 100644 --- a/dlls/d3d9/tests/device.c +++ b/dlls/d3d9/tests/device.c @@ -4752,11 +4752,14 @@ done: static void test_reset_fullscreen(void) { struct device_desc device_desc; - D3DDISPLAYMODE d3ddm, d3ddm2; unsigned int mode_count, i; IDirect3DDevice9 *device; + D3DDISPLAYMODE d3ddm; WNDCLASSEXA wc = {0}; IDirect3D9 *d3d; + BOOL fvwm2_bad; + D3DFORMAT fmt; + RECT r1, r2; HRESULT hr; ATOM atom; static const struct message messages[] = @@ -4795,49 +4798,74 @@ static void test_reset_fullscreen(void) goto cleanup; }
+ IDirect3D9_GetAdapterDisplayMode(d3d, D3DADAPTER_DEFAULT, &d3ddm); + fmt = d3ddm.Format; + mode_count = IDirect3D9_GetAdapterModeCount(d3d, D3DADAPTER_DEFAULT, fmt); + for (i = 0; i < mode_count; ++i) + { + hr = IDirect3D9_EnumAdapterModes(d3d, D3DADAPTER_DEFAULT, fmt, i, &d3ddm); + ok(SUCCEEDED(hr), "Failed to enumerate display mode, hr %#lx.\n", hr); + + if (d3ddm.Width != registry_mode.dmPelsWidth || d3ddm.Height != registry_mode.dmPelsHeight) + break; + } + if (i == mode_count) + { + skip("Could not find a suitable display mode.\n"); + goto cleanup; + } + /* * Switch to fullscreen mode. * This will force the window to be shown and will cause the WM_ACTIVATEAPP * message to be sent. */ - device_desc.width = registry_mode.dmPelsWidth; - device_desc.height = registry_mode.dmPelsHeight; + device_desc.width = d3ddm.Width; + device_desc.height = d3ddm.Height; device_desc.device_window = device_window; device_desc.flags = CREATE_DEVICE_FULLSCREEN; ok(SUCCEEDED(reset_device(device, &device_desc)), "Failed to reset device.\n");
+ /* FVWM2 doesn't update its knowledge about the screen size on mode changes. It + * will therefore try to resize our fullscreen window to the original size. We + * don't learn about this until we process messages, but then we receive a + * WM_SIZE message after d3d9 stopped filtering. To dodge the issue, switch to + * the original mode when testing for WM_SIZE, but double check that the window + * size was changed. */ + GetWindowRect(device_window, &r1); + flush_events(); ok(expect_messages->message == 0, "Expected to receive message %#x.\n", expect_messages->message); expect_messages = NULL;
- IDirect3D9_GetAdapterDisplayMode(d3d, D3DADAPTER_DEFAULT, &d3ddm); - mode_count = IDirect3D9_GetAdapterModeCount(d3d, D3DADAPTER_DEFAULT, d3ddm.Format); - for (i = 0; i < mode_count; ++i) - { - hr = IDirect3D9_EnumAdapterModes(d3d, D3DADAPTER_DEFAULT, d3ddm.Format, i, &d3ddm2); - ok(SUCCEEDED(hr), "Failed to enumerate display mode, hr %#lx.\n", hr); - - if (d3ddm2.Width != d3ddm.Width || d3ddm2.Height != d3ddm.Height) - break; - } - if (i == mode_count) - { - skip("Could not find a suitable display mode.\n"); - goto cleanup; - } + /* This is what FVWM2 does to us: Resize the window behind our back. */ + GetWindowRect(device_window, &r2); + fvwm2_bad = !EqualRect(&r1, &r2); + todo_wine_if(fvwm2_bad) + ok(EqualRect(&r1, &r2), "Window rect changed during event flush: %s -> %s.\n", + wine_dbgstr_rect(&r1), wine_dbgstr_rect(&r2));
wm_size_received = 0;
/* Fullscreen mode change. */ - device_desc.width = d3ddm2.Width; - device_desc.height = d3ddm2.Height; + device_desc.width = registry_mode.dmPelsWidth; + device_desc.height = registry_mode.dmPelsHeight; device_desc.device_window = device_window; device_desc.flags = CREATE_DEVICE_FULLSCREEN; ok(SUCCEEDED(reset_device(device, &device_desc)), "Failed to reset device.\n");
flush_events(); + + /* And some FVWM2 versions still insist on resizing in the flush_events() above - + * my fvwm 2.7.0 resizes the window back to the size before the mode change and + * then back to the new mode. This doesn't affect the gitlab machines afaics. */ + todo_wine_if (fvwm2_bad && wm_size_received) ok(!wm_size_received, "Received unexpected WM_SIZE message.\n");
+ GetWindowRect(device_window, &r2); + ok(!EqualRect(&r1, &r2), "Window rectangles before and after mode change are equal: %s\n", + wine_dbgstr_rect(&r1)); + cleanup: if (device) IDirect3DDevice9_Release(device); IDirect3D9_Release(d3d);
From: Stefan Dösinger stefan@codeweavers.com
d3d8 is not affected in this way because it does not call flush_events() between focus loss and re-gain. --- dlls/d3d9/tests/device.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/dlls/d3d9/tests/device.c b/dlls/d3d9/tests/device.c index 0d38f68bf3f..2948d030d12 100644 --- a/dlls/d3d9/tests/device.c +++ b/dlls/d3d9/tests/device.c @@ -4228,10 +4228,24 @@ static void test_wndproc(void) } flush_events();
+ if (!(tests[i].create_flags & CREATE_DEVICE_NOWINDOWCHANGES)) + { + /* The window manager on Gitlab CI (FVWM2) insists on restoring the device window for some + * reason. In this case we might not receive WM_WINDOWPOSCHANGING on the device window + * because nothing is being changed on focus gain. Don't bother checking for messages in + * this case - depending on FVWM2's mood (and most likely the location of the mouse pointer, + * thanks to focus follows mouse) we may be activated or deactivated one or more times. + * + * Wine's internal status is being updated while processing messages (the flush_events above). */ + ret = IsIconic(device_window); + todo_wine_if (!ret) + ok(ret, "Expected the device window to still be iconic.\n"); + } + /* I have to minimize and restore the focus window, otherwise native d3d9 fails * device::reset with D3DERR_DEVICELOST. This does not happen when the window * restore is triggered by the user. */ - expect_messages = tests[i].reactivate_messages; + expect_messages = ret ? tests[i].reactivate_messages : NULL; ShowWindow(focus_window, SW_MINIMIZE); ShowWindow(focus_window, SW_RESTORE); /* Set focus twice to make KDE and fvwm in focus-follows-mouse mode happy. */ @@ -4239,8 +4253,11 @@ static void test_wndproc(void) flush_events(); SetForegroundWindow(focus_window); flush_events(); /* WM_WINDOWPOSCHANGING etc arrive after SetForegroundWindow returns. */ - ok(!expect_messages->message, "Expected message %#x for window %#x, but didn't receive it, i=%u.\n", - expect_messages->message, expect_messages->window, i); + if (expect_messages) + { + ok(!expect_messages->message, "Expected message %#x for window %#x, but didn't receive it, i=%u.\n", + expect_messages->message, expect_messages->window, i); + } expect_messages = NULL;
hr = IDirect3DDevice9_TestCooperativeLevel(device);
From: Stefan Dösinger stefan@codeweavers.com
The issue can be reproduced with LIBGL_ALWAYS_SOFTWARE=1 on Mesa. The Gitlab machines are affected. --- dlls/d3d9/tests/device.c | 4 +++- dlls/d3d9/tests/utils.h | 11 ++++++++++- 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/dlls/d3d9/tests/device.c b/dlls/d3d9/tests/device.c index 2948d030d12..01a3023c8bf 100644 --- a/dlls/d3d9/tests/device.c +++ b/dlls/d3d9/tests/device.c @@ -6625,7 +6625,9 @@ static void test_occlusion_query(void)
hr = IDirect3DDevice9_BeginScene(device); ok(hr == D3D_OK, "Failed to begin scene, hr %#lx.\n", hr); - for (i = 0; i < 50000; ++i) + /* This loop can take some time on Mesa's software renderer. Don't push + * it too high or it might time out on Gitlab CI. */ + for (i = 0; i < 30000; ++i) { hr = IDirect3DQuery9_Issue(query, D3DISSUE_BEGIN); ok(hr == D3D_OK, "Got unexpected hr %#lx.\n", hr); diff --git a/dlls/d3d9/tests/utils.h b/dlls/d3d9/tests/utils.h index 248e07eb521..9b932df4ef9 100644 --- a/dlls/d3d9/tests/utils.h +++ b/dlls/d3d9/tests/utils.h @@ -27,12 +27,21 @@ static inline void wait_query_(const char *file, unsigned int line, IDirect3DQue unsigned int i; HRESULT hr;
- for (i = 0; i < 500; ++i) + /* Some of the larger loops in test_occlusion_query() can take quite some time on + * software Mesa (i.e., the gitlab CI machines). */ + for (i = 0; i < 1000; ++i) { if ((hr = IDirect3DQuery9_GetData(query, NULL, 0, D3DGETDATA_FLUSH)) == S_OK) break; Sleep(10); } + + if (hr == S_OK) + { + todo_wine_if (i >= 500) + ok_(file, line)(i < 500, "Query took %u ms to finish.\n", i * 10); + } + ok_(file, line)(hr == S_OK, "Got unexpected hr %#lx.\n", hr); }
I updated the commits and removed the draft marker. With this version there's a chance we can enable d3d9:device: https://gitlab.winehq.org/stefan/wine/-/pipelines/13809
Although I have seen some failures previously, not sure if they are race conditions that still happen (the query test might be influenced by other tasks hogging CPU) or me not pushing commits to the CI pipeline properly.
Zebediah Figura (@zfigura) commented about dlls/d3d9/tests/device.c:
- }
- if (i == mode_count)
- {
skip("Could not find a suitable display mode.\n");
goto cleanup;
- }
- /* * Switch to fullscreen mode. * This will force the window to be shown and will cause the WM_ACTIVATEAPP * message to be sent. */
- device_desc.width = registry_mode.dmPelsWidth;
- device_desc.height = registry_mode.dmPelsHeight;
- device_desc.width = d3ddm.Width;
- device_desc.height = d3ddm.Height;
This bothers me, because now we're not testing the same thing that we were testing anymore. Maybe the difference doesn't matter, though...
Zebediah Figura (@zfigura) commented about dlls/d3d9/tests/device.c:
- device_desc.width = registry_mode.dmPelsWidth;
- device_desc.height = registry_mode.dmPelsHeight;
device_desc.width = d3ddm.Width;
device_desc.height = d3ddm.Height; device_desc.device_window = device_window; device_desc.flags = CREATE_DEVICE_FULLSCREEN; ok(SUCCEEDED(reset_device(device, &device_desc)), "Failed to reset device.\n");
/* FVWM2 doesn't update its knowledge about the screen size on mode changes. It
* will therefore try to resize our fullscreen window to the original size. We
* don't learn about this until we process messages, but then we receive a
* WM_SIZE message after d3d9 stopped filtering. To dodge the issue, switch to
* the original mode when testing for WM_SIZE, but double check that the window
* size was changed. */
GetWindowRect(device_window, &r1);
I'm finding this comment a bit difficult to understand, sorry. The problem—what's making the test fail—is just that it tries to resize our window, right? I.e. for the purposes of the test, it doesn't matter that it's also to the wrong size?
This doesn't really seem like it's "dodging" the issue, either. The whole point of 37daa3573 is to test that WM_SIZE isn't sent, and here, it is. I suppose we can detect fvwm2 by checking that it's resizing to the wrong size, but it does kind of cripple the test.
Why does changing mode twice make a difference?
Zebediah Figura (@zfigura) commented about dlls/d3d9/tests/device.c:
} flush_events();
if (!(tests[i].create_flags & CREATE_DEVICE_NOWINDOWCHANGES))
{
/* The window manager on Gitlab CI (FVWM2) insists on restoring the device window for some
* reason. In this case we might not receive WM_WINDOWPOSCHANGING on the device window
* because nothing is being changed on focus gain. Don't bother checking for messages in
* this case - depending on FVWM2's mood (and most likely the location of the mouse pointer,
* thanks to focus follows mouse) we may be activated or deactivated one or more times.
*
* Wine's internal status is being updated while processing messages (the flush_events above). */
ret = IsIconic(device_window);
todo_wine_if (!ret)
ok(ret, "Expected the device window to still be iconic.\n");
}
Can we minimize the window ourselves to get it into a consistent state, instead of skipping tests?
Zebediah Figura (@zfigura) commented about dlls/d3d9/tests/utils.h:
HRESULT hr;
- for (i = 0; i < 500; ++i)
- /* Some of the larger loops in test_occlusion_query() can take quite some time on
* software Mesa (i.e., the gitlab CI machines). */
- for (i = 0; i < 1000; ++i) { if ((hr = IDirect3DQuery9_GetData(query, NULL, 0, D3DGETDATA_FLUSH)) == S_OK) break; Sleep(10); }
- if (hr == S_OK)
- {
todo_wine_if (i >= 500)
ok_(file, line)(i < 500, "Query took %u ms to finish.\n", i * 10);
I don't think that a todo_wine makes sense here? It's not like it's invalid behaviour to take a long time to render.
How long does rendering take, in seconds? If it's too long we may as well just get rid of the test (or skip it if the device is a software renderer); it's nice to have but not at the expense of slowing down the test suite.
I'm kind of unhappy about these windowing patches. I know that this is blocking us from turning on d3d tests in the CI, and I agree that's a bad thing. On the other hand, I don't like the way these are working around the failures, because they're *not* working around them—they're crippling the tests.
Contrary to what Alexandre said a few weeks ago, our d3d windowing tests aren't failing because they're testing things too zealously—they're failing because, by all appearances, the *core assumptions that are being tested* are being violated by fvwm2. That means that the applications that depend on this behaviour are going to be broken on fvwm2. So what I would really like to see is that this bug just gets fixed upstream.
A first step towards that, as Henri mentioned, is testing with fvwm3 instead, because it may already be fixed upstream. I don't know how stable fvwm3 is in general, but the fact that fvwm2's README says "All users are hereforth encouraged to use fvwm3, and to report any bugs" suggests that it's what we should be targeting at this point *anyway*.
I'm sure this is not what you want to spend your time doing, and if necessary I can put my time where my mouth is and try to push this upstream myself.
d3d8 is not affected in this way because it does not call flush_events() between focus loss and re-gain.
That seems undesirable. I don't think we want the same test to differ between d3d8 and d3d9 without a reason. In this case, if the flush_events() call is catching something, we should probably add it to d3d8.
Going forward, if all else fails, and we do start piling up workarounds for upstream bugs (which is likely unavoidable in general), I really want to see a way to mark those bugs that (a) lets us know when the bugs get fixed upstream, so we can remove them, and (b) doesn't hide it if the feature is also, independently, broken in Wine. This patch series, and other patches like 8bd9fcbde, just use "todo_wine_if (!a) ok (a)", which does neither.
vkd3d has a "bug" keyword, but this doesn't really address either concern. As far as I can tell "bug_if (a) ok(b)" is effectively shorthand for "todo_wine_if (a && !b) ok(b)" [and also reporting the bug count separately from the todo count, but practically I fail to see the point in this.]
What I'd propose instead, just from brainstorming, is that we tag bugs like quirks, and run the tests with a list of bugs (as an env var probably) that we expect to affect any given platform, hardcoded into the CI or any given developer's machine. E.g. we could annotate tests like
bug("fvwm2-restore") ok(IsIconic(device_window), ...);
or alternatively
todo_wine_if (bug("fvwm2-restore")) ok(IsIconic(device_window), ...);
and run the tests with
WINE_TEST_BUGS=fvwm2-restore,llvmpipe-something,et-cetera
Thoughts?
To be clear I'm happy to write up this proposal myself, if it sees consensus; I'm just commenting on this patch series because it's the straw that broke the camel's back.
I'm kind of unhappy about these windowing patches. I know that this is blocking us from turning on d3d tests in the CI, and I agree that's a bad thing. On the other hand, I don't like the way these are working around the failures, because they're *not* working around them—they're crippling the tests.
We have been nitpicking around a hand full of test failures for years. Be it on Windows or Wine. While we bother about 5 failing ok lines 158636 (number from the final device.ok output) are staying unused. By delaying enabling the tests on a CI machine that matters for future patches more broken tests will be added and the problem moves on.
I don't care how we take care of those handful of failing tests. As you say, my patch changes the test to work around fvwm bugs but, as much as possible, keeping the essence of what we want tested. I am fine to do any of the others:
-> Remove the tests -> Mark them flaky -> todo_wine_if(fail) ok(!fail) everything -> Switch fvwm to click to focus. Afaics that eliminates most of the difference to how Windows behaves, but it won't change its inability to handle mode switches. -> Run them in a virtual desktop -> Switch to KDE (that once passed, but with KDE updates different failures were introduced. I found kde / kwin the most sensible WM when I wrote those windowing tests)
But whatever choice, I think we want the other good 158k tests enabled this month, not next year.
Now re switching to some other WM, last time I proposed this a lot of user32 tests were failing elsewhere because the user32 tests were written to make Alexandre's fvwm2 happy. Not something that makes me happy, but keeping Alexandre's preferred WM in his preferred settings is the path of least resistence.
I'll reply to the more detailed technical questions separately.
On Mon Aug 21 23:42:21 2023 +0000, Zebediah Figura wrote:
I don't think that a todo_wine makes sense here? It's not like it's invalid behaviour to take a long time to render. How long does rendering take, in seconds? If it's too long we may as well just get rid of the test (or skip it if the device is a software renderer); it's nice to have but not at the expense of slowing down the test suite.
I think the host machine is running multiple containers in parallel. The 50k query test did randomly take more than 50 seconds. At some point we will run into the test timeout (afaik 120s)
I am open to removing this test. I am not exactly sure what it is/was good for. Showing that starting and stopping a query doesn't randomly add samples rendered?
On Mon Aug 21 23:42:21 2023 +0000, Zebediah Figura wrote:
Can we minimize the window ourselves to get it into a consistent state, instead of skipping tests?
I could try, but it was just minimized by d3d earlier and restored by fvwm, I don't expect minimizing it again will do any good.
And those tests are fragile - minimizing it redundantly to make fvwm happy might break Windows, kwin, virtual desktop, etc.
On Mon Aug 21 23:42:20 2023 +0000, Zebediah Figura wrote:
I'm finding this comment a bit difficult to understand, sorry. The problem—what's making the test fail—is just that it tries to resize our window, right? I.e. for the purposes of the test, it doesn't matter that it's also to the wrong size? This doesn't really seem like it's "dodging" the issue, either. The whole point of 37daa3573 is to test that WM_SIZE isn't sent, and here, it is. I suppose we can detect fvwm2 by checking that it's resizing to the wrong size, but it does kind of cripple the test. Why does changing mode twice make a difference?
In the current test WM_SIZE(*) gets sent twice below the d3d level - once when d3d resizes the window, and this message is filtered by d3d. The test tests that this is not supposed to be delivered.
The second one is when fvwm assumes that the fullscreen window isn't fullscreen and resizes it. At this point d3d is long done changing the window size, stopped filtering, and WM_SIZE gets delivered.
By changing the second mode switch to switch back to the original size fvwm never thinks it needs to change the window size because after d3d is done changing modes and resizing the window both the screen mode and window size match what fvwm knows about the screen size. Thus the first (ultimately filtered) WM_SIZE is generated by user32 when wined3d switches windows, and there is never an incorrect second one sent.
(*) It is actually WM_WINDOWPOSCHANGING + WM_WINDOWPOSCHANGED, the WM_SIZE is just the response of DefWindowProc to WM_WINDOWPOSCHANGED.
On Tue Aug 22 06:34:49 2023 +0000, Stefan Dösinger wrote:
I'm kind of unhappy about these windowing patches. I know that this is
blocking us from turning on d3d tests in the CI, and I agree that's a bad thing. On the other hand, I don't like the way these are working around the failures, because they're *not* working around them—they're crippling the tests. We have been nitpicking around a hand full of test failures for years. Be it on Windows or Wine. While we bother about 5 failing ok lines 158636 (number from the final device.ok output) are staying unused. By delaying enabling the tests on a CI machine that matters for future patches more broken tests will be added and the problem moves on. I don't care how we take care of those handful of failing tests. As you say, my patch changes the test to work around fvwm bugs but, as much as possible, keeping the essence of what we want tested. I am fine to do any of the others:
- Remove the tests
- Mark them flaky
- todo_wine_if(fail) ok(!fail) everything
- Switch fvwm to click to focus. Afaics that eliminates most of the
difference to how Windows behaves, but it won't change its inability to handle mode switches.
- Run them in a virtual desktop
- Switch to KDE (that once passed, but with KDE updates different
failures were introduced. I found kde / kwin the most sensible WM when I wrote those windowing tests) But whatever choice, I think we want the other good 158k tests enabled this month, not next year. Now re switching to some other WM, last time I proposed this a lot of user32 tests were failing elsewhere because the user32 tests were written to make Alexandre's fvwm2 happy. Not something that makes me happy, but keeping Alexandre's preferred WM in his preferred settings is the path of least resistence. I'll reply to the more detailed technical questions separately.
One more option I can think of: Move every test that changes modes to a new file, e.g. d3d9/tests/modeset.c, skip it on gitlab and enable the rest.
vkd3d has a "bug" keyword, but this doesn't really address either concern. As far as I can tell "bug_if (a) ok(b)" is effectively shorthand for "todo_wine_if (a && !b) ok(b)" [and also reporting the bug count separately from the todo count, but practically I fail to see the point in this.]
For what it's worth, the idea at the time was that the process would go something like this:
- We investigate a test failure, and it turns out it's an upstream driver bug. - We report/fix that bug upstream, and a driver fix gets committed. - We add a "bug_if(is_<vendor>_device(device, "<version>"))" to the tests and wait for the fixed version to be sufficiently propagated. In practice that would probably be until it's in Debian stable. - We remove the bug_if() again.
Of course I can't say that's been terribly successful in practice, and there are various reasons for that. One of those is that at the time there was much closer cooperation between upstream Wine/vkd3d and Mesa than there currently is. Most of that cooperation was based on people being motivated to make things better in their free time though, and between Mesa switching to Gitlab and the Valve forks, some of the underlying motivation essentially just evaporated.
On Tue Aug 22 06:44:53 2023 +0000, Stefan Dösinger wrote:
In the current test WM_SIZE(*) gets sent twice below the d3d level - once when d3d resizes the window, and this message is filtered by d3d. The test tests that this is not supposed to be delivered. The second one is when fvwm assumes that the fullscreen window isn't fullscreen and resizes it. At this point d3d is long done changing the window size, stopped filtering, and WM_SIZE gets delivered. By changing the second mode switch to switch back to the original size fvwm never thinks it needs to change the window size because after d3d is done changing modes and resizing the window both the screen mode and window size match what fvwm knows about the screen size. Thus the first (ultimately filtered) WM_SIZE is generated by user32 when wined3d switches windows, and there is never an incorrect second one sent. (*) It is actually WM_WINDOWPOSCHANGING + WM_WINDOWPOSCHANGED, the WM_SIZE is just the response of DefWindowProc to WM_WINDOWPOSCHANGED.
Ah, I think that makes sense. So we resize the window from the registry mode to the new mode, but fvwm thinks that the registry mode is still the screen size, notices that the window is maximized with the "wrong" size, and tries to resize it back to the registry mode?
I could try, but it was just minimized by d3d earlier and restored by fvwm, I don't expect minimizing it again will do any good.
Maybe, yeah, but maybe not. After all, if it's a bug that's causing the window to be restored, then it might not trigger every time. It's not like fvwm spuriously tries to restore every window we ever minimize, after all. Seems like it's worth a try, at least.
And those tests are fragile - minimizing it redundantly to make fvwm happy might break Windows, kwin, virtual desktop, etc.
Well, I was thinking we'd re-minimize it only if it's not iconic, so we'd only affect the broken fvwm case.
On Tue Aug 22 06:37:25 2023 +0000, Stefan Dösinger wrote:
I think the host machine is running multiple containers in parallel. The 50k query test did randomly take more than 50 seconds. At some point we will run into the test timeout (afaik 120s) I am open to removing this test. I am not exactly sure what it is/was good for. Showing that starting and stopping a query doesn't randomly add samples rendered?
It's from 2dc3d2454, for bug 45932. So it's basically a regression test, and as such does have some real value. But if it's going to take a ridiculous amount of time then I think it's just not worth that cost.
Ideally we'd just get rid of it on llvmpipe where it takes too long to render. We don't have a good way to detect llvmpipe, but we could add one.
But if tweaking the numbers is enough for now then we can live with that instead. I just don't think the todo_wine makes sense.
On Wed Aug 23 03:45:29 2023 +0000, Zebediah Figura wrote:
I could try, but it was just minimized by d3d earlier and restored by
fvwm, I don't expect minimizing it again will do any good. Maybe, yeah, but maybe not. After all, if it's a bug that's causing the window to be restored, then it might not trigger every time. It's not like fvwm spuriously tries to restore every window we ever minimize, after all. Seems like it's worth a try, at least.
And those tests are fragile - minimizing it redundantly to make fvwm
happy might break Windows, kwin, virtual desktop, etc. Well, I was thinking we'd re-minimize it only if it's not iconic, so we'd only affect the broken fvwm case.
Fair enough, I'll give that a try
On Wed Aug 23 03:52:42 2023 +0000, Zebediah Figura wrote:
It's from 2dc3d2454, for bug 45932. So it's basically a regression test, and as such does have some real value. But if it's going to take a ridiculous amount of time then I think it's just not worth that cost. Ideally we'd just get rid of it on llvmpipe where it takes too long to render. We don't have a good way to detect llvmpipe, but we could add one. But if tweaking the numbers is enough for now then we can live with that instead. I just don't think the todo_wine makes sense.
Hmm, I see your point. I wanted to have some sort of notice in the test output that things took longer than expected. Would a trace work for you? But we wouldn't see that on test.winehq.org by default I think.
What I want is some sort of message that says "hey, this took really long. If the test times out down the line, this might be why"
On Wed Aug 23 03:42:21 2023 +0000, Zebediah Figura wrote:
Ah, I think that makes sense. So we resize the window from the registry mode to the new mode, but fvwm thinks that the registry mode is still the screen size, notices that the window is maximized with the "wrong" size, and tries to resize it back to the registry mode?
Yes, exactly.
Thinking about it, we might retain approximately as much value by checking for !wm_size_received after reset() returns but before we flush events. I guess what matters to the games is that it doesn't get WM_SIZE or WM_WINDOWPOSCHANGING on the focus window *while* in reset - presumably because it calls reset() from the WM_SIZE message handler, which would cause endless recursion.
Mode changing games would be broken on fvwm regardless of message handling, since fvwm just increases the size to something wrong we'll stretch the output in Present.
On Wed Aug 23 08:11:19 2023 +0000, Stefan Dösinger wrote:
Hmm, I see your point. I wanted to have some sort of notice in the test output that things took longer than expected. Would a trace work for you? But we wouldn't see that on test.winehq.org by default I think. What I want is some sort of message that says "hey, this took really long. If the test times out down the line, this might be why"
Yeah, I think a trace is a good way to accomplish that.
On Tue Aug 22 08:53:11 2023 +0000, Stefan Dösinger wrote:
One more option I can think of: Move every test that changes modes to a new file, e.g. d3d9/tests/modeset.c, skip it on gitlab and enable the rest.
I get it. What bothers me is this: suppose we have d3d9 running on CI, and a test breaks for some reason. We have four options:
(a) investigate and try to fix it (b) mark it as todo_wine and move on (c) disable the whole test unit on CI (d) ignore it, let it fail
I feel obliged to at least try to push for (a), especially in a case where we know that a game is going to be broken, but it's pretty clear by now that if we have time for investigating bugs in general, it's limited and can't be depended on. And it's also evident that (c) is seen as preferable to (d), although I don't know (or have forgotten) why this is the case for d3d and not anything else. And evidently (b) is seen as preferable to (c).
The upshot of all of this is that our broken tests are just going to pile up. What's worse is, if the failure is intermittent (as I understand these failures are, but maybe I'm mistaken?), the tests don't just get broken, triggering spurious failures for any developer *not* affected by the bug—they get crippled, which means that at worst the test is completely useless.
And I still have doubts that getting all of the other tests running under CI is worth making making some completely useless. Currently, we can say for sure that that test passes legitimately, even if it's only true of one machine in the world and that makes things worse for everyone. If we cripple the test, that no longer becomes true.
Anyway, with all of that said, I think specifically tagging bugs, as I was thinking of, is a way we can make the tests pass for everyone without crippling them, and it's something I can live with. It still means that bugs are going to pile up, but I guess that's unavoidable, and importantly I think it alleviates my other worries.
But, before all that...
Now re switching to some other WM, last time I proposed this a lot of user32 tests were failing elsewhere because the user32 tests were written to make Alexandre's fvwm2 happy. Not something that makes me happy, but keeping Alexandre's preferred WM in his preferred settings is the path of least resistence.
Mostly what I'm wondering at this point is, can we *upgrade* the WM? If fvwm2 is EOL, and its authors prescribe fvwm3, then we should probably be using fvwm3 *anyway*. Of course, a major version upgrade has its caveats, but it seems at least worth a try.
For what it's worth, the idea at the time was that the process would go something like this:
- We investigate a test failure, and it turns out it's an upstream driver bug.
- We report/fix that bug upstream, and a driver fix gets committed.
- We add a "bug_if(is_<vendor>_device(device, "<version>"))" to the tests and wait for the fixed version to be sufficiently propagated. In practice that would probably be until it's in Debian stable.
- We remove the bug_if() again.
The workflow makes sense, but I kind of fail to understand why to use "bug_if(...)" instead of "todo_wine_if(...)", except for I guess a bit of extra code clarity.
But my other concern about bug(), as it is, is that before step 4 happens, the bug will be forgotten about. And because the tests now succeed either way, nobody will notice when the fixed version *does* hit repositories. My idea is to introduce some more stringent tracking that will, among other things, make sure that tests will start "failing" once the bug is fixed.
The workflow makes sense, but I kind of fail to understand why to use "bug_if(...)" instead of "todo_wine_if(...)", except for I guess a bit of extra code clarity.
But my other concern about bug(), as it is, is that before step 4 happens, the bug will be forgotten about. And because the tests now succeed either way, nobody will notice when the fixed version *does* hit repositories. My idea is to introduce some more stringent tracking that will, among other things, make sure that tests will start "failing" once the bug is fixed.
The idea was that the driver version check would take care of that. We don't have that in current vkd3d, but it's not too hard to add. We also have VKD3D_TEST_BUG=0 to prevent these from passing. Note that I'm not necessarily advocating for adopting that scheme in Wine; just trying to provide some context.
Mostly what I'm wondering at this point is, can we *upgrade* the WM? If fvwm2 is EOL, and its authors prescribe fvwm3, then we should probably be using fvwm3 *anyway*. Of course, a major version upgrade has its caveats, but it seems at least worth a try.
It's available in Debian bookworm/stable, at least.
Fwiw if we decide we need to change the CI WM, I'd suggest using OpenBox. It seems to generally do a better job than fvwm and any other WM tested (see https://test.winehq.org/data/patterns.html).
I'm not testing fvwm3 there but as far as I can see it still suffers from the same issue fvwm2, where it might steal input focus from our windows when WM_TAKE_FOCUS is used (https://github.com/fvwmorg/fvwm/pull/101).
Fwiw if we decide we need to change the CI WM, I'd suggest using OpenBox. It seems to generally do a better job than fvwm and any other WM tested (see https://test.winehq.org/data/patterns.html).
I'm not testing fvwm3 there but as far as I can see it still suffers from the same issue fvwm2, where it might steal input focus from our windows when WM_TAKE_FOCUS is used (https://github.com/fvwmorg/fvwm/pull/101).
Sure. To be clear I'm not suggesting that we should switch to fvwm3 because it (might) fix all of our problems. I'm suggesting we should switch to it primarily because fvwm2 is EOL and won't be receiving any new fixes. And if upgrading fixes *some* of our problems along the way, so much the better.
Of course, if there's a WM out there that works perfectly, I think it's a good idea to use it, but historically there's been hesitation around that.
Switch fvwm to click to focus. Afaics that eliminates most of the difference to how Windows behaves, but it won't change its inability to handle mode switches.
I think fvwm2's default SloppyFocus policy is so different from Windows', and also from other window managers (GNOME, KDE), that it does not make sense to run the tests in that configuration. The [TestBot uses ClickToFocus](https://wiki.winehq.org/Wine_TestBot_VMs#Unix_configuration) for that reason and to me the GitLab CI should too. That's moot if switching to another WM of course but if not it's a GitLab must-fix.
Switch fvwm to click to focus. Afaics that eliminates most of the difference to how Windows behaves, but it won't change its inability to handle mode switches.
I think fvwm2's default SloppyFocus policy is so different from Windows', and also from other window managers (GNOME, KDE), that it does not make sense to run the tests in that configuration. The [TestBot uses ClickToFocus](https://wiki.winehq.org/Wine_TestBot_VMs#Unix_configuration) for that reason and to me the GitLab CI should too. That's moot if switching to another WM of course but if not it's a GitLab must-fix.
Yeah. I don't know why we didn't do this right from the beginning. Or apply any of the other accumulated knowledge from the TestBot, for that matter.
Yeah. I don't know why we didn't do this right from the beginning. Or apply any of the other accumulated knowledge from the TestBot, for that matter.
We can certainly tweak things, but I see no evidence that the Testbot runs consistently have better results than the Gitlab ones.
And honestly, if the tests can only succeed in a carefully tuned environment, that's not much better than when they only succeeded on my box. We need to accept that Wine runs in various environments with various upstream bugs, and deal with that. If upstream bugs have a high impact we need to work around them in the code; if we decide that the impact is low enough not to bother, then it should be OK to ignore the failures in the tests.
There may be some unusual configurations that we decide not to support, and maybe fvwm focus is one of them, but that should be the last resort. We need to stop using upstream bugs as an excuse to not fix the tests.
On Mon Aug 28 09:28:04 2023 +0000, Stefan Dösinger wrote:
Fair enough, I'll give that a try
Re-minimizing the window seems to work both on gitlab CI and locally. I'll include this change in the next revision of this MR.
https://gitlab.winehq.org/stefan/wine/-/merge_requests/6 has patches with and without curse words and two successful d3d9:device runs.
The usual caveat applies that at least locally this is a flaky failure, so maybe we reduce the likelihood of failures from 5% to 0.1% and it still bites us later.