https://bugs.winehq.org/show_bug.cgi?id=50448
Bug ID: 50448 Summary: Recent changes to ntdll-NtAlertThreadByThreadId causes that threads in some applications might look up causing periodical lookups Product: Wine-staging Version: 6.0-rc5 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: -unknown Assignee: wine-bugs@winehq.org Reporter: bjorn.bidar@thaodan.de CC: leslie_alistair@hotmail.com, z.figura12@gmail.com Distribution: ---
After `b8ca0eae9f47491ba257c422a2bc03fc37d13c22` that changed ntdll-NtAlertThreadByThreadId the threads of some applications might lookup up causing periodical freezes for 1s-3s in some applications like Wotld of Warcraft or Deus Ex. The rest of my session still works and does not freeze when that happens.
In the case of Wow I especialy noticed that the input will be open again sooner than the graphics/sound.
The log will look like this: 02cc:err:sync:RtlpWaitForCriticalSection section <same address> "?" wait timed out in thread 02cc, blocked by 0000, retrying (60 sec)
I noticed that in the case of Wow the critical section that looks is almost every time the same.
Please see this wine-tkg bug report as a reference: https://github.com/Frogging-Family/wine-tkg-git/issues/254
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #1 from Zebediah Figura z.figura12@gmail.com --- Thanks for the report. Can you please attach a log with WINEDEBUG=+seh,+pid,+sync,+timestamp, provided that the problem can be reproduced with those channels?
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #2 from Bjoern Bidar bjorn.bidar@thaodan.de --- (In reply to Zebediah Figura from comment #1)
Thanks for the report. Can you please attach a log with WINEDEBUG=+seh,+pid,+sync,+timestamp, provided that the problem can be reproduced with those channels?
Can you limited that to something smaller? This would require at least some minutes of logging creating a very big log.
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #3 from Zebediah Figura z.figura12@gmail.com --- (In reply to Bjoern Bidar from comment #2)
(In reply to Zebediah Figura from comment #1)
Thanks for the report. Can you please attach a log with WINEDEBUG=+seh,+pid,+sync,+timestamp, provided that the problem can be reproduced with those channels?
Can you limited that to something smaller? This would require at least some minutes of logging creating a very big log.
Removing +seh might help, but I suspect +sync is the culprit, and I kind of need that to be able to track down a deadlock.
There's a *chance* I might be able to figure out the error with something like +seh,+pid,+module,+loaddll,+server,+thread,+timestamp, though.
https://bugs.winehq.org/show_bug.cgi?id=50448
Maciej Stanczew maciej.stanczew+b@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |maciej.stanczew+b@gmail.com
--- Comment #4 from Maciej Stanczew maciej.stanczew+b@gmail.com --- Created attachment 69081 --> https://bugs.winehq.org/attachment.cgi?id=69081 Starcraft lockups log
This also impacts Warcraft III and Starcraft. In both cases there are lockups when launching and closing the game, and with Starcraft there is additionally one just after the main menu is shown.
I'm attaching Starcraft log with '+seh,+pid,+sync,+module,+loaddll,+server,+thread,+timestamp'; W3 produced way too large log to fit into attachments.
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #5 from Zebediah Figura z.figura12@gmail.com --- Created attachment 69082 --> https://bugs.winehq.org/attachment.cgi?id=69082 wake all threads in RtlpUnWaitCriticalSection
Thanks for the log.
[The following is a detailed analysis. For those not interested: I've attached a patch which works around this bug, but I don't think it's the correct solution. Please test it regardless, to confirm whether my analysis is correct.]
This is worrying (trimmed):
1734.988:00f8:010c:trace:sync:RtlWaitOnAddress addr 000000007BC66558 cmp 000000007BC756B8 size 0x4 timeout fffffffffd050f80 1734.988:00f8:010c:trace:sync:NtWaitForAlertByThreadId (nil) fffffffffd050f80 00fc: suspend_thread( handle=0090 ) 010c: *sent signal* signal=10 00fc: suspend_thread() = 0 { count=0 } 010c: select( flags=2, cookie=13ffcecec, timeout=0, size=0, prev_apc=0000, result={}, data={}, context={...} ) 010c: select() = PENDING { call={APC_NONE}, apc_handle=0000, context={} } ... 1736.645:00f8:00fc:trace:sync:RtlWaitOnAddress addr 000000007BC66558 cmp 000000007BC756B8 size 0x4 timeout fffffffffd050f80 1736.645:00f8:00fc:trace:sync:NtWaitForAlertByThreadId (nil) fffffffffd050f80 ... 1736.646:00f8:0108:trace:sync:RtlWakeAddressSingle 000000007BC66558 1736.646:00f8:0108:trace:sync:NtAlertThreadByThreadId 0x10c ... 1741.629:00f8:00fc:err:sync:RtlpWaitForCriticalSection section 000000007BC66540 "../wine-staging/dlls/ntdll/loader.c: loader_section" wait timed out in thread 00fc, blocked by 0000, retrying (60 sec)
010c tries to grab the loader lock and is suspended while sleeping on it. It never wakes up, for one reason or another. Meanwhile, 00fc tries to grab the loader lock while 0108 has it. 0108 eventually releases it, but we only release a single thread (after all, only one thread can hold a CS at a time) and the first available thread is 010c, which is suspended. 00fc doesn't get woken up. It doesn't deadlock forever, because we break out of the wait every 5 seconds to print a message and find that, oh hey! the value has changed since the last sleep, and nobody's holding the lock anymore. But it deadlocks for 5 seconds, which does a lot to explain the "periodic freezes" mentioned.
This isn't a problem with the current upstream implementation, because a suspended thread won't actually be sleeping on a futex (it'll have had its context changed, etc), and so the kernel will only ever wake up a running thread.
I wrote a minimal test case to reproduce this. On Windows 7 this pattern works and the non-suspended thread always gets woken up. On Windows 10 it doesn't work, and the non-suspended thread doesn't always get woken up (it likely depends on the order in which the threads were queued—if I queue two threads and then suspend the first, the second doesn't wake up; if I queue two threads and then suspend the second, the first wakes up.) My guess is that Windows 10 is using RtlWakeAddressSingle() internally [it hangs similarly if I use win32 futexes instead] whereas win7 is using something else, probably a semaphore [RtlWakeAddressSingle() not having been introduced yet].
The problem is, this makes it quite unclear how Windows does in fact avoid this race. The application is effectively doing this from inside its DLL_PROCESS_ATTACH:
thread = CreateThread(...); WaitForSingleObject(thread, 5); while (SuspendThread(thread) >= 0);
It must be depending on the thread never grabbing the loader lock, or releasing the loader lock by the time the wait returns, both of which seem more than a little suspicious.
I've attached a patch that ensures all eligible threads get woken up when releasing a critical section. It should fix this bug, but also cause a thundering herd problem. I don't think it's the right answer, but it'd be good to confirm that this analysis is correct (and hopefully the only bug). Please test it.
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #6 from Zebediah Figura z.figura12@gmail.com --- Actually, I just realized an even easier race. It's not quite documented, but two simultaneous calls to RtlWakeAddressSingle() apparently should always wake at least two threads, and my patches violated that. I've pushed a fix as https://github.com/wine-staging/wine-staging/commit/dfddef9654d4a24642c61c287a93f22eb7a51e12.
I think that this indirectly ends up fixing the race described in comment 5 as well. At least one other thread will end up waking 010c before 0108 does (while 010c is the only thread actually waiting on the lock) and so 0108 wakes 00fc instead.
Please test with current wine-staging (and without the patch from comment 5); I think that should end up fixing this bug completely.
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #7 from Bjoern Bidar bjorn.bidar@thaodan.de --- (In reply to Zebediah Figura from comment #6)
Actually, I just realized an even easier race. It's not quite documented, but two simultaneous calls to RtlWakeAddressSingle() apparently should always wake at least two threads, and my patches violated that. I've pushed a fix as https://github.com/wine-staging/wine-staging/commit/ dfddef9654d4a24642c61c287a93f22eb7a51e12.
I think that this indirectly ends up fixing the race described in comment 5 as well. At least one other thread will end up waking 010c before 0108 does (while 010c is the only thread actually waiting on the lock) and so 0108 wakes 00fc instead.
Please test with current wine-staging (and without the patch from comment 5); I think that should end up fixing this bug completely.
It is better now but not fixed. Before I had a lag when the first 3D scene was rendered (login-screen) that is now gone. But after some time I still get the freeze.
02cc:err:sync:RtlpWaitForCriticalSection section 000000002501BAE0 "?" wait timed out in thread 02cc, blocked by 0000, retrying (60 sec)
When I logged with the suggested options I did not got a freeze.
Here is the log: https://thaodan.de/public/archlinux/wow_wine_6.0_rc5.log.xz (WINEDEBUG=+seh,+pid,+sync,+timestamp,)
https://bugs.winehq.org/show_bug.cgi?id=50448
Paul Gofman pgofman@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |pgofman@codeweavers.com
--- Comment #8 from Paul Gofman pgofman@codeweavers.com --- (In reply to Zebediah Figura from comment #5)
It must be depending on the thread never grabbing the loader lock, or releasing the loader lock by the time the wait returns, both of which seem more than a little suspicious.
Windows (modern) loader implementation is known to take loader lock in much less number of cases than Wine currently does (see my patches [1] with an attempt to move that way which I made in regards to the other random lock up I came through without relation to the patchset concerned here). Do you know which functions lead to taking loader lock here? E. g., GetProcAddress() and GetModuleHandle() take the loader lock now in Wine but not on Windows. However, there are much more functions with excessive locking than are currently concerned by the linked patchset.
1. https://www.winehq.org/pipermail/wine-devel/2020-November/177695.html
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #9 from Zebediah Figura z.figura12@gmail.com --- In comment 5, 010c blocks trying to acquire the lock in LdrInitializeThunk(), while 00fc has it since it's in DLL_PROCESS_ATTACH. Later, I think 0108 has it due to LdrInitializeThunk(), while 00fc wants use LdrLoadDll(), but I'm not fully sure.
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #10 from Zebediah Figura z.figura12@gmail.com --- (In reply to Bjoern Bidar from comment #7)
It is better now but not fixed. Before I had a lag when the first 3D scene was rendered (login-screen) that is now gone. But after some time I still get the freeze.
02cc:err:sync:RtlpWaitForCriticalSection section 000000002501BAE0 "?" wait timed out in thread 02cc, blocked by 0000, retrying (60 sec)
When I logged with the suggested options I did not got a freeze.
Here is the log: https://thaodan.de/public/archlinux/wow_wine_6.0_rc5.log.xz (WINEDEBUG=+seh,+pid,+sync,+timestamp,)
Does it help if you apply the patch from comment 5?
https://bugs.winehq.org/show_bug.cgi?id=50448
ibrokemypie@outlook.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |ibrokemypie@outlook.com
--- Comment #11 from ibrokemypie@outlook.com --- I tried the patch on 6.0rc5 compiled just now, still getting the occasional multi second freezes in Guild Wars 2
https://bugs.winehq.org/show_bug.cgi?id=50448
soredake gi85qht0z@relay.firefox.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |gi85qht0z@relay.firefox.com
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #12 from Zebediah Figura z.figura12@gmail.com --- (In reply to Rylee Randall from comment #11)
I tried the patch on 6.0rc5 compiled just now, still getting the occasional multi second freezes in Guild Wars 2
Do you mean the patch from comment 5? Is that with https://github.com/wine-staging/wine-staging/commit/dfddef9654d4a24642c61c287a93f22eb7a51e12 also?
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #13 from Rylee Randall ibrokemypie@outlook.com --- (In reply to Zebediah Figura from comment #12)
(In reply to Rylee Randall from comment #11)
I tried the patch on 6.0rc5 compiled just now, still getting the occasional multi second freezes in Guild Wars 2
Do you mean the patch from comment 5? Is that with https://github.com/wine-staging/wine-staging/commit/ dfddef9654d4a24642c61c287a93f22eb7a51e12 also?
Yes the patch from comment 5, yes with the latest wine staging commits as of when I wrote the comment
https://bugs.winehq.org/show_bug.cgi?id=50448
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Recent changes to |ntdll-NtAlertThreadByThread |ntdll-NtAlertThreadByThread |Id causes intermittent |Id causes that threads in |temporary hangs in multiple |some applications might |applications |look up causing periodical | |lookups |
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #14 from Zebediah Figura z.figura12@gmail.com --- I can reproduce frequent intermittent hangs locally in the game Rift (or the Glyph launcher; I don't know). I don't see an effect in the game, but I see frequent err:sync messages with various critical sections, including at least one Wine-internal CS (the wined3d CS) implying that some thread is missing a wakeup when the CS is released. I managed, *once*, to reproduce this with +sync, and I got this (trimmed):
188567.658:014c:01a0:trace:sync:RtlWaitOnAddress addr 00007FF30C2074D8 cmp 000000007BC72298 size 0x4 timeout fffffffffd050f80 188567.658:014c:01a0:trace:sync:NtWaitForAlertByThreadId (nil) fffffffffd050f80 188567.658:014c:01bc:trace:sync:RtlWakeAddressSingle 00007FF30C2074D8 188567.658:014c:01bc:trace:sync:NtAlertThreadByThreadId 0x1a0 188567.658:014c:01bc:trace:sync:RtlWakeAddressSingle 00007FF30C2074D8
188567.667:014c:01c0:trace:sync:RtlWaitOnAddress addr 00007FF30C2074D8 cmp 000000007BC72298 size 0x4 timeout fffffffffd050f80
188572.660:014c:01c0:trace:sync:RtlWaitOnAddress addr 00007FF30C2074D8 cmp 000000007BC72298 size 0x4 timeout ffffffffdc3cba00
The first few lines (at .658) are within expectations; nothing to see there. But the last two lines are very weird, because *there is no call to RtlWakeAddressSingle*. I don't understand how this can even happen; it would imply that any thread releasing the lock between those two points saw a LockCount of 0, which shouldn't be possible if 01c0 is waiting on it.
https://bugs.winehq.org/show_bug.cgi?id=50448
braiamp@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |braiamp@gmail.com
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #15 from Zebediah Figura z.figura12@gmail.com --- Created attachment 69112 --> https://bugs.winehq.org/attachment.cgi?id=69112 add explicit memory barriers
This patch seems to fix the deadlocks in RIFT for me. Can you please test it on top of current wine-staging?
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #16 from Rylee Randall ibrokemypie@outlook.com --- (In reply to Bjoern Bidar from comment #2)
(In reply to Zebediah Figura from comment #1)
Thanks for the report. Can you please attach a log with WINEDEBUG=+seh,+pid,+sync,+timestamp, provided that the problem can be reproduced with those channels?
Can you limited that to something smaller? This would require at least some minutes of logging creating a very big log.
This patch solves the periodic freezing in guildwars for me, played about half an hour with no problems
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #17 from Rylee Randall ibrokemypie@outlook.com --- (In reply to Rylee Randall from comment #16)
(In reply to Bjoern Bidar from comment #2)
(In reply to Zebediah Figura from comment #1)
Thanks for the report. Can you please attach a log with WINEDEBUG=+seh,+pid,+sync,+timestamp, provided that the problem can be reproduced with those channels?
Can you limited that to something smaller? This would require at least some minutes of logging creating a very big log.
This patch solves the periodic freezing in guildwars for me, played about half an hour with no problems
Sorry my reply went a bit wonky, I meant the patch from comment 15
https://bugs.winehq.org/show_bug.cgi?id=50448
Taufan Rezzafri rezzafri@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |rezzafri@gmail.com
--- Comment #18 from Taufan Rezzafri rezzafri@gmail.com --- After "add explicit memory barriers patch" no more periodic freeze and not a single crash when playing Fenyx Rising for about 3.5 hours.
https://bugs.winehq.org/show_bug.cgi?id=50448
aersaud@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |aersaud@gmail.com
--- Comment #19 from aersaud@gmail.com --- I can confirm this fixes the freezes and lock ups for me as well in multiple games.
https://bugs.winehq.org/show_bug.cgi?id=50448
Mel pmargeti34@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |pmargeti34@gmail.com
--- Comment #20 from Mel pmargeti34@gmail.com --- Another confirmation, tested with Warframe & Guild wars 2.
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #21 from braiamp@gmail.com --- Using dxvk was causing stalls every 5 seconds on Subnautica. Applying comment #5 and #15 patches over staging, stopped the stalling.
https://bugs.winehq.org/show_bug.cgi?id=50448
Kevin kwh@fastmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |kwh@fastmail.com
https://bugs.winehq.org/show_bug.cgi?id=50448
Jesse losetowin@protonmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |losetowin@protonmail.com
https://bugs.winehq.org/show_bug.cgi?id=50448
Lam winehq@lam.pl changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |winehq@lam.pl
--- Comment #22 from Lam winehq@lam.pl --- wine-staging 6.0, Fedora 33 (wine-6.0-1.fc33.x86_64)
Quake Champions had horrible freezes every few seconds (even the menu was practically unusable, like 1-2 seconds freeze after 3-5 seconds of working).
This actually started about 6.0-rc4, but as usual, I was waiting for it to fix itself ;)
After patching with the patch from comment #15, the freezes are gone. Unlike the previous commenter, I've ignored comment #5 - sounded like that was a quick and dirty brute force attempt and the new patch obsoletes that, am I right? Anyways - 1 minute of testing confirms it helps, thanks Zebediah and make sure to commit for 6.1 ;)
https://bugs.winehq.org/show_bug.cgi?id=50448
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Fixed by SHA1| |f8ce6cbb21e09c80320514770c1 | |02a1a8e9cfde6 Resolution|--- |FIXED
--- Comment #23 from Zebediah Figura z.figura12@gmail.com --- Fix pushed as f8ce6cbb21e09c80320514770c102a1a8e9cfde6; sorry for the delay.
https://bugs.winehq.org/show_bug.cgi?id=50448
Robert Walker bob.mt.wya@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |bob.mt.wya@gmail.com
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #24 from Bjoern Bidar bjorn.bidar@thaodan.de --- (In reply to Zebediah Figura from comment #23)
Fix pushed as f8ce6cbb21e09c80320514770c102a1a8e9cfde6; sorry for the delay.
So the only thing missing now is to fix the performance regression from this?
https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #25 from Zebediah Figura z.figura12@gmail.com --- (In reply to Bjoern Bidar from comment #24)
(In reply to Zebediah Figura from comment #23)
Fix pushed as f8ce6cbb21e09c80320514770c102a1a8e9cfde6; sorry for the delay.
So the only thing missing now is to fix the performance regression from this?
Broadly speaking, yes. It's not unlikely that b8ca0eae9 already helped, though, and a similar patch on the PE side could also help.
https://bugs.winehq.org/show_bug.cgi?id=50448
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |llenort@aol.com
--- Comment #26 from Zebediah Figura z.figura12@gmail.com --- *** Bug 50566 has been marked as a duplicate of this bug. ***
https://bugs.winehq.org/show_bug.cgi?id=50448
Alexis Peypelut iroalexis@outlook.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |iroalexis@outlook.fr
https://bugs.winehq.org/show_bug.cgi?id=50448
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #27 from Zebediah Figura z.figura12@gmail.com --- Closing bugs fixed in wine-staging 6.1.
https://bugs.winehq.org/show_bug.cgi?id=50448
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |shagooserver@gmail.com
--- Comment #28 from Zeb Figura z.figura12@gmail.com --- *** Bug 50455 has been marked as a duplicate of this bug. ***