https://bugs.winehq.org/show_bug.cgi?id=53963
Bug ID: 53963 Summary: d3d8:device & d3d9:device sometimes break threading and WineTest Product: Wine Version: unspecified Hardware: x86-64 OS: Linux Status: NEW Severity: normal Priority: P2 Component: d3d Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
d3d8:device & d3d9:device sometimes break threading, causing all the tests that follow to get stuck. The typical symptom is a series of tests returning 258:
d3d9:d3d9ex:080c done (0) in 43s 9467B d3d9:device start dlls/d3d9/tests/device.c d3d9:device:0278 done (258) in 120s 0B d3d9:stateblock start dlls/d3d9/tests/stateblock.c d3d9:stateblock:0288 done (258) in 120s 0B d3d9:visual start dlls/d3d9/tests/visual.c d3d9:visual:0290 done (258) in 120s 0B
See https://test.winehq.org/data/patterns.html
When this happens all the tests that follow get stuck which makes it impossible to run the full WineTest suite, hence the lack of WineTest results on some machines in the patterns page above.
Notes: * This can also happen in d3d8:device. Consequently the first test that gets stuck is either d3d8:stateblock or d3d9:stateblock. * winedbg also gets stuck as soon as it displays the prompt :-( * This only impacts dual-screen configurations. This is why this breaks the debian11 and debiant (dual-screen) VMs but not debian11b and fgtbdebian11 (single-screen).
Note that in the run above there is no d3d9:device trace which means it did not run right but there is no other indication of what is wrong. Fortunately on one run d3d9:device produced a proper crash dump implicating RtlCreateUserThread() which would make sense given the symptoms:
d3d9:device start dlls/d3d9/tests/device.c device.c:780: Test marked todo: Test 0: Got unexpected FVF 0, expected 0x2. device.c:780: Test marked todo: Test 1: Got unexpected FVF 0, expected 0x4. device.c:1119: Tests skipped: Multisampling not supported for D3DFMT_X8R8G8B8, skipping test. device.c:4821: Test failed: Received unexpected WM_SIZE message. device.c:2852: Test failed: Failed to create a D3D object. Unhandled exception: page fault on read access to 0x00000000 in 32-bit code (0x00434da7). Register dump: CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b EIP:00434da7 ESP:0071f7b0 EBP:0071f868 EFLAGS:00010246( R- -- I Z- -P- ) EAX:00000000 EBX:00000000 ECX:0071f750 EDX:00000000 ESI:005a0068 EDI:0071f824 Stack dump: 0x0071f7b0: 00000000 004de71c 004df984 00cf0000 0x0071f7c0: 00000064 00000064 000000a0 000000a0 0x0071f7d0: 00000000 00000000 00000000 00000000 0x0071f7e0: 00000000 00000002 00000012 005a0068 0x0071f7f0: 688e32f0 00001897 00000000 00000000 0x0071f800: 00000000 004df9fc 00000000 00000007 Backtrace: =>0 0x00434da7 in d3d9_test (+0x34da7) (0x0071f868) 1 0x00459367 in d3d9_test (+0x59367) (0x0071fde8) 2 0x004d8151 in d3d9_test (+0xd8151) (0x0071fee8) 3 0x004d7b3f in d3d9_test (+0xd7b3f) (0x0071ff30) 4 0x7b62a1f0 in kernel32 (+0x2a1f0) (0x0071ff48) 5 0x7bc5d727 in ntdll (+0x5d727) (0x0071ff5c) 6 0x7bc5df50 RtlCreateUserThread(entry=004D7AC0, arg=7FFD1000) [Z:\home\winetest\tools\testbot\var\wine\dlls\ntdll\thread.c:306] in ntdll (0x0071ffec) 0x00434da7 d3d9_test+0x34da7: movl 0x0(%ebx),%eax
And on stderr: 0658:err:d3d:wined3d_adapter_create_output Failed to initialise output L"\\.\DISPLAY1", hr 0x80070057. wine: Unhandled page fault on read access to 00000000 at address 00434DA7 (thread 0658), starting debugger...
The million dollar question: how to reproduce the failure? * First you need a dual-screen test configuration. * The crash is not systematic. It can be necessary to run the test 4 - 8 maybe 10 times to get a crash. It's hard to tell. * After some runs d3d9:device gets an underflow in GenerateRampFromGamma() and crashes from that. I don't have a log of that crash but I remember that the gamma was around 0.00003 (plus or minus a few zeroes). So it is likely that the test does not restore the gamma correctly, resulting in it running with this nonsensically low gamma value after a while. Regardless, this seems to prevent d3d*:device from crashing in the way we want which means after a few runs it may be necessary to restart the X server :-( * Also sometimes, again after running the tests a number of times, one of the screens 'disappears' (virt-viewer says something like 'connecting to display' or it will show a single abnormally wide desktop) and then the issue cannot be reproduced anymore. This also requires restarting the X server :-(
This failure started on 2022-11-17 but because it is such a pain to reproduce and even more to bisect I was not able to pinpoint the commit causing it. However Rémi points out that it first appeared in the following merge request:
https://gitlab.winehq.org/wine/wine/-/merge_requests/1399
And when that got merged into Wine all hell broke loose: all debian11 test runs started timing out. It's not clear how it's related to threading though. Also it only touches programs/explorer so it's strange that it would cause threading issues in d3d9:device. So it's possible it only triggers a preexisting bug somehow.
https://bugs.winehq.org/show_bug.cgi?id=53963
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |critical Keywords| |source, testcase
https://bugs.winehq.org/show_bug.cgi?id=53963
--- Comment #1 from François Gouget fgouget@codeweavers.com --- Another symptom of this is when "winetest.exe d3d8 d3d9" gets stuck on exit. It's not systematic though.
https://bugs.winehq.org/show_bug.cgi?id=53963
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |austinenglish@gmail.com
https://bugs.winehq.org/show_bug.cgi?id=53963
--- Comment #2 from François Gouget fgouget@codeweavers.com --- The RtlCreateUserThread() thing is a red herring: it's the function that initializes a new thread so it appears in the backtrace of every Wine thread.
https://bugs.winehq.org/show_bug.cgi?id=53963
Zhiyi Zhang zzhang@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |zzhang@codeweavers.com
--- Comment #3 from Zhiyi Zhang zzhang@codeweavers.com --- Hopefully, aeb43dce75f45e44ebf08d73ddf9a66f043e4fbe will fix this.
https://bugs.winehq.org/show_bug.cgi?id=53963
--- Comment #4 from François Gouget fgouget@codeweavers.com --- That commit did fix the immediate issue and make it possible to run the full test suite in a pure 32-bit Wine environment again.
That said I don't understand how having d3d[89]:device timing out could cause all the other processes to get stuck, including things like winedbg --command "bt all"!
But then, paradoxically since this consistently broke the 32-bit tests on debian11, this issue is such a pain to reproduce that maybe it's not worth pursuing (maybe bug 54014 would be a simpler approach, for the same issue?)
Close this bug?
https://bugs.winehq.org/show_bug.cgi?id=53963
Zhiyi Zhang zzhang@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |rbernon@codeweavers.com
https://bugs.winehq.org/show_bug.cgi?id=53963
--- Comment #5 from Zhiyi Zhang zzhang@codeweavers.com --- I am in favor of closing this. Rémi should know more.