https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #5 from Zebediah Figura z.figura12@gmail.com --- Created attachment 69082 --> https://bugs.winehq.org/attachment.cgi?id=69082 wake all threads in RtlpUnWaitCriticalSection
Thanks for the log.
[The following is a detailed analysis. For those not interested: I've attached a patch which works around this bug, but I don't think it's the correct solution. Please test it regardless, to confirm whether my analysis is correct.]
This is worrying (trimmed):
1734.988:00f8:010c:trace:sync:RtlWaitOnAddress addr 000000007BC66558 cmp 000000007BC756B8 size 0x4 timeout fffffffffd050f80 1734.988:00f8:010c:trace:sync:NtWaitForAlertByThreadId (nil) fffffffffd050f80 00fc: suspend_thread( handle=0090 ) 010c: *sent signal* signal=10 00fc: suspend_thread() = 0 { count=0 } 010c: select( flags=2, cookie=13ffcecec, timeout=0, size=0, prev_apc=0000, result={}, data={}, context={...} ) 010c: select() = PENDING { call={APC_NONE}, apc_handle=0000, context={} } ... 1736.645:00f8:00fc:trace:sync:RtlWaitOnAddress addr 000000007BC66558 cmp 000000007BC756B8 size 0x4 timeout fffffffffd050f80 1736.645:00f8:00fc:trace:sync:NtWaitForAlertByThreadId (nil) fffffffffd050f80 ... 1736.646:00f8:0108:trace:sync:RtlWakeAddressSingle 000000007BC66558 1736.646:00f8:0108:trace:sync:NtAlertThreadByThreadId 0x10c ... 1741.629:00f8:00fc:err:sync:RtlpWaitForCriticalSection section 000000007BC66540 "../wine-staging/dlls/ntdll/loader.c: loader_section" wait timed out in thread 00fc, blocked by 0000, retrying (60 sec)
010c tries to grab the loader lock and is suspended while sleeping on it. It never wakes up, for one reason or another. Meanwhile, 00fc tries to grab the loader lock while 0108 has it. 0108 eventually releases it, but we only release a single thread (after all, only one thread can hold a CS at a time) and the first available thread is 010c, which is suspended. 00fc doesn't get woken up. It doesn't deadlock forever, because we break out of the wait every 5 seconds to print a message and find that, oh hey! the value has changed since the last sleep, and nobody's holding the lock anymore. But it deadlocks for 5 seconds, which does a lot to explain the "periodic freezes" mentioned.
This isn't a problem with the current upstream implementation, because a suspended thread won't actually be sleeping on a futex (it'll have had its context changed, etc), and so the kernel will only ever wake up a running thread.
I wrote a minimal test case to reproduce this. On Windows 7 this pattern works and the non-suspended thread always gets woken up. On Windows 10 it doesn't work, and the non-suspended thread doesn't always get woken up (it likely depends on the order in which the threads were queued—if I queue two threads and then suspend the first, the second doesn't wake up; if I queue two threads and then suspend the second, the first wakes up.) My guess is that Windows 10 is using RtlWakeAddressSingle() internally [it hangs similarly if I use win32 futexes instead] whereas win7 is using something else, probably a semaphore [RtlWakeAddressSingle() not having been introduced yet].
The problem is, this makes it quite unclear how Windows does in fact avoid this race. The application is effectively doing this from inside its DLL_PROCESS_ATTACH:
thread = CreateThread(...); WaitForSingleObject(thread, 5); while (SuspendThread(thread) >= 0);
It must be depending on the thread never grabbing the loader lock, or releasing the loader lock by the time the wait returns, both of which seem more than a little suspicious.
I've attached a patch that ensures all eligible threads get woken up when releasing a critical section. It should fix this bug, but also cause a thundering herd problem. I don't think it's the right answer, but it'd be good to confirm that this analysis is correct (and hopefully the only bug). Please test it.