[Bug 50448] Recent changes to ntdll-NtAlertThreadByThreadId causes that threads in some applications might look up causing periodical lookups

5 Jan 2021


      https://bugs.winehq.org/show_bug.cgi?id=50448
--- Comment #5 from Zebediah Figura z.figura12@gmail.com ---
Created attachment 69082
  --> https://bugs.winehq.org/attachment.cgi?id=69082
wake all threads in RtlpUnWaitCriticalSection
Thanks for the log.
[The following is a detailed analysis. For those not interested: I've attached
a patch which works around this bug, but I don't think it's the correct
solution. Please test it regardless, to confirm whether my analysis is
correct.]
This is worrying (trimmed):
1734.988:00f8:010c:trace:sync:RtlWaitOnAddress addr 000000007BC66558 cmp
000000007BC756B8 size 0x4 timeout fffffffffd050f80
1734.988:00f8:010c:trace:sync:NtWaitForAlertByThreadId (nil) fffffffffd050f80
00fc: suspend_thread( handle=0090 )
010c: *sent signal* signal=10
00fc: suspend_thread() = 0 { count=0 }
010c: select( flags=2, cookie=13ffcecec, timeout=0, size=0, prev_apc=0000,
result={}, data={}, context={...} )
010c: select() = PENDING { call={APC_NONE}, apc_handle=0000, context={} }
...
1736.645:00f8:00fc:trace:sync:RtlWaitOnAddress addr 000000007BC66558 cmp
000000007BC756B8 size 0x4 timeout fffffffffd050f80
1736.645:00f8:00fc:trace:sync:NtWaitForAlertByThreadId (nil) fffffffffd050f80
...
1736.646:00f8:0108:trace:sync:RtlWakeAddressSingle 000000007BC66558
1736.646:00f8:0108:trace:sync:NtAlertThreadByThreadId 0x10c
...
1741.629:00f8:00fc:err:sync:RtlpWaitForCriticalSection section 000000007BC66540
"../wine-staging/dlls/ntdll/loader.c: loader_section" wait timed out in thread
00fc, blocked by 0000, retrying (60 sec)
010c tries to grab the loader lock and is suspended while sleeping on it. It
never wakes up, for one reason or another. Meanwhile, 00fc tries to grab the
loader lock while 0108 has it. 0108 eventually releases it, but we only release
a single thread (after all, only one thread can hold a CS at a time) and the
first available thread is 010c, which is suspended. 00fc doesn't get woken up.
It doesn't deadlock forever, because we break out of the wait every 5 seconds
to print a message and find that, oh hey! the value has changed since the last
sleep, and nobody's holding the lock anymore. But it deadlocks for 5 seconds,
which does a lot to explain the "periodic freezes" mentioned.
This isn't a problem with the current upstream implementation, because a
suspended thread won't actually be sleeping on a futex (it'll have had its
context changed, etc), and so the kernel will only ever wake up a running
thread.
I wrote a minimal test case to reproduce this. On Windows 7 this pattern works
and the non-suspended thread always gets woken up. On Windows 10 it doesn't
work, and the non-suspended thread doesn't always get woken up (it likely
depends on the order in which the threads were queued—if I queue two threads
and then suspend the first, the second doesn't wake up; if I queue two threads
and then suspend the second, the first wakes up.) My guess is that Windows 10
is using RtlWakeAddressSingle() internally [it hangs similarly if I use win32
futexes instead] whereas win7 is using something else, probably a semaphore
[RtlWakeAddressSingle() not having been introduced yet].
The problem is, this makes it quite unclear how Windows does in fact avoid this
race. The application is effectively doing this from inside its
DLL_PROCESS_ATTACH:
thread = CreateThread(...);
WaitForSingleObject(thread, 5);
while (SuspendThread(thread) >= 0);
It must be depending on the thread never grabbing the loader lock, or releasing
the loader lock by the time the wait returns, both of which seem more than a
little suspicious.
I've attached a patch that ensures all eligible threads get woken up when
releasing a critical section. It should fix this bug, but also cause a
thundering herd problem. I don't think it's the right answer, but it'd be good
to confirm that this analysis is correct (and hopefully the only bug). Please
test it.
-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[Bug 50448] Recent changes to ntdll-NtAlertThreadByThreadId causes that threads in some applications might look up causing periodical lookups