https://bugs.winehq.org/show_bug.cgi?id=54979
--- Comment #2 from John Nagle nagle@animats.com --- I don't see an easy way to eliminate that innermost spinlock, either.
It's not a long lock. The slowest thing inside is searching a linear list of waiting threads, which ought to take maybe a microsecond. So it may be that the outer loops are asking for that lock too often.
There are three nested locks here, two futexes and a hard spinlock. The one that looks most suspicious is the middle one at wait_semaphore. That loop repeatedly calls RtlWaitOnAddress, which is probably why control is usually in the innermost spinlock on a debugger break. If wait_semaphore didn't do that, the problem might go away.
The outermost lock has
if (crit->LockCount > 0) break; /* more than one waiter, don't bother spinning */
so it's protected against this kind of problem. If that lock starts to build up a backlog, the thread blocks rather than spinning. But wait_semaphore doesn't have similar protection.
Thanks for looking at this.