https://bugs.winehq.org/show_bug.cgi?id=54979
--- Comment #3 from Zeb Figura z.figura12@gmail.com --- (In reply to Zeb Figura from comment #1)
However, we can probably avoid RtlWaitOnAddress for critical sections.
This is harder than I thought. I thought it would be relatively easy to implement a lock-free wait queue on the stacks of the waiting threads, but after thinking it through I don't think it's actually possible at all.
In theory we could try supplying hints as to which thread to wake (say, with a bounded ring buffer). The problem with this is that we only have a pointer's worth of space to work with, and we can't allocate more because we underly the heap implementation.
(In reply to John Nagle from comment #2)
There are three nested locks here, two futexes and a hard spinlock. The one that looks most suspicious is the middle one at wait_semaphore. That loop repeatedly calls RtlWaitOnAddress, which is probably why control is usually in the innermost spinlock on a debugger break. If wait_semaphore didn't do that, the problem might go away.
I don't think this is suspicious. We use RtlWakeAddressSingle(), and win32 futexes are fair, so in theory a given waiting thread should only actually be woken once. At worst I think it can have one stale buffered wakeup from a previous alert.