https://bugs.winehq.org/show_bug.cgi?id=54064
Bug ID: 54064 Summary: ntdll:threadpool - test_tp_io() sometimes fails & crashes in Wine (GitLab CI) Product: Wine Version: unspecified Hardware: x86-64 OS: Linux Status: NEW Severity: normal Priority: P2 Component: ntdll Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
ntdll:threadpool - test_tp_io() sometimes fails & crashes in Wine (GitLab CI):
ntdll:threadpool start dlls/ntdll/tests/threadpool.c wine: Unhandled page fault on read access to 0000000C at address 7BC644D1 (thread 07c4), starting debugger... threadpool.c:641: Test marked todo: TpSimpleTryPost unexpectedly returned status 0 threadpool.c:1595: Test marked todo: expected that timers are merged threadpool.c:2034: Test failed: got 0xc0000017 threadpool.c:2035: Test failed: expected non-NULL TP_IO Unhandled exception: page fault on read access to 0x0000000c in 32-bit code (0x7bc644d1). ... Backtrace: =>0 0x7bc644d1 impl_from_TP_IO+0x3(io=<internal error>) [Z:\builds\zfigura\wine\dlls\ntdll\threadpool.c:351] in ntdll (0x0077fd08) 1 0x7bc644d1 TpWaitForIoCompletion+0x11(io=00000000, cancel_pending=0) [Z:\builds\zfigura\wine\dlls\ntdll\threadpool.c:348] in ntdll (0x0077fd08) 2 0x004e7a58 in ntdll_test (+0xe7a58) (0x0077fde8) ...
Strangely enough this failure never happened in the nightly WineTest runs but it did impact at least 5 merge requests so far:
* 2022-11-30 MR!1615 * 2022-11-29 MR!1545 * 2022-11-25 MR!1537 * 2022-11-23 MR!1508 * 2022-11-10 MR!1326
https://bugs.winehq.org/show_bug.cgi?id=54064
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |source, testcase
https://bugs.winehq.org/show_bug.cgi?id=54064
--- Comment #1 from François Gouget fgouget@codeweavers.com --- Sometimes TpWaitForIoCompletion() times out instead of crashing which is probably the same issue as above:
ntdll:threadpool start dlls/ntdll/tests/threadpool.c threadpool.c:641: Test marked todo: TpSimpleTryPost unexpectedly returned status 0 threadpool.c:1595: Test marked todo: expected that timers are merged threadpool.c:2043: Test failed: TpWaitForIoCompletion() should not return threadpool.c:2059: Test failed: wait timed out 07c0:threadpool: 4692 tests executed (2 marked as todo, 0 as flaky, 2 failures), 0 skipped.
This variant happened at least twice on the GitLab CI (impacting 2 merge requests) but also happened once in the nightly WineTest runs (only once in 6 months!):
* 2022-11-20 MR!1450 * 2022-11-10 MR!1326 * 2022-06-15 on debiant-wow32
Note that, as far as I can tell, all these failures happened in the wow32 configuration.
https://bugs.winehq.org/show_bug.cgi?id=54064
Eric Pouech eric.pouech@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |eric.pouech@gmail.com
--- Comment #2 from Eric Pouech eric.pouech@gmail.com --- (In reply to François Gouget from comment #0)
ntdll:threadpool - test_tp_io() sometimes fails & crashes in Wine (GitLab CI):
ntdll:threadpool start dlls/ntdll/tests/threadpool.c wine: Unhandled page fault on read access to 0000000C at address 7BC644D1 (thread 07c4), starting debugger... threadpool.c:641: Test marked todo: TpSimpleTryPost unexpectedly returned status 0 threadpool.c:1595: Test marked todo: expected that timers are merged threadpool.c:2034: Test failed: got 0xc0000017
0xc0000017 is STATUS_NO_MEMORY
clearly this test (like many many others) don't bother on OOM conditions perhaps it could be a good idea to bump memory resources of VM or Docker image (whatever it runs on)
https://bugs.winehq.org/show_bug.cgi?id=54064
--- Comment #3 from François Gouget fgouget@codeweavers.com --- I don't know what the GitLab CI's Docker configuration is: I don't see anything related to memory in either of .gitlab-ci.yml, tools/gitlab/image.yml or tools/gitlab/image.docker. On the TestBot the Windows 10+ VMs have 4 GB of RAM [1] which seems sufficient (the Wine ones have 6 GB but just because you need at least that to compile with 16 threads :-).
But isn't it possible that some part of the test has a bug causing it to allocate too much memory, thus causing the memory issue? If so that would be something to fix. But maybe it's just me being too suspicious of the "just add more memory" kind of fix.
[1] At some point they may actually have been running with only 3 GB because QEmu's memory ballooning is not dynamic at all.
https://bugs.winehq.org/show_bug.cgi?id=54064
--- Comment #4 from Eric Pouech eric.pouech@gmail.com ---
But isn't it possible that some part of the test has a bug causing it to allocate too much memory, thus causing the memory issue? If so that would be something to fix. But maybe it's just me being too suspicious of the "just add more memory" kind of fix.
likely both issues are possible ;-)
looking at yesterday and todays 64 bit windows test results: - dxgi succeeds today on w864, while it failed yesterday (at least dxgi and d3d12 failed yesterday and today on w7)
so something is bringing different memory conditions on the tests, making them not simply reproducible
this could be (non exhaustive, and potentially cumulative) - previous tests stressing memory - OS not releasing memory & resources immediately - other (non winetest) processes using resources while running the tests - ...
I've seen (I don't remember which <g>) other tests failing because of missing resources
https://bugs.winehq.org/show_bug.cgi?id=54064
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #5 from Zeb Figura z.figura12@gmail.com --- Yeah, I don't think this test should be running out of memory. (Also note that "bump memory resources" probably isn't a solution since it's going to run out of address space, not physical memory.)
https://bugs.winehq.org/show_bug.cgi?id=54064
--- Comment #6 from Zeb Figura z.figura12@gmail.com --- Oh, well, the problem is that test_tp_multi_wait() ends up creating a very large number of threads, and those seem to eat up the entire address space simply from their stacks.
I'd diagnose this as a failure to actually pool, but I can get Windows to not pool either if I let the threads take a little longer. So perhaps we should just cut down on the number of threads we're spawning.
https://bugs.winehq.org/show_bug.cgi?id=54064
--- Comment #7 from François Gouget fgouget@codeweavers.com --- I found it strange that the crash would happen after test_tp_multi_wait() completes. But TpReleasePool() is asynchronous: it signals the threads to exit and returns immediately. So there may still be a lot of threads when test_tp_io() runs.
I don't know how many threads are needed in order for test_tp_multi_wait() to be meaningful. But maybe another approach would be to minimize their stack size with something like this: https://gitlab.winehq.org/wine/wine/-/merge_requests/2133
https://bugs.winehq.org/show_bug.cgi?id=54064
--- Comment #8 from François Gouget fgouget@codeweavers.com --- ntdll:threadpool now tries to minimize its stack usage (9e5d54eac408) but that did not fix the issue. So this issue is again open for any takers.
https://bugs.winehq.org/show_bug.cgi?id=54064
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed by SHA1| |9e5d54eac4085fa90049a6f5274 | |9ae78b86b05b6 Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #9 from François Gouget fgouget@codeweavers.com --- I'm not sure why I said this was not fixed in my previous comment (#c8 2023-03-17): * The stack minimization patch was committed on 2023-02-27 (9e5d54eac408). * The last known crash was in MR!2286's GitLab CI run. That got merged on 2023-02-27 too but the GitLab CI run with the crash was on 2023-02-25 so it did not have the fix. * The last crash in the nightly WineTest runs is a bit older: 2023-02-17. * Since then there has been one crash in threadpoolwinrt:threadpool but it happened after I said this was not fixed so it could not be the source of my confusion.
So, as far as I can tell now, there has been no crash since the stack minimization patch was committed. So I think this is fixed after all.
https://bugs.winehq.org/show_bug.cgi?id=54064
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #10 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 8.15.