[Bug 54979] New: Futexes which protect memory allocation congest, causing 100x performance degradation

List overview All Threads

newer

older

[Bug 42060] New: Zafehouse Diaries...

[Bug 57644] New: cam350 2075 does...

WineHQ Bugzilla

25 May 2023 25 May '23

5:42 p.m.

https://bugs.winehq.org/show_bug.cgi?id=54979

Bug ID: 54979 Summary: Futexes which protect memory allocation congest, causing 100x performance degradation Product: Wine Version: 8.0.1 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: ntdll Assignee: wine-bugs@winehq.org Reporter: nagle@animats.com Distribution: ---

Created attachment 74518 --> https://bugs.winehq.org/attachment.cgi?id=74518 Debugger output with multiple threads stuck in spinlocks.

SUMMARY

Under the wrong circumstances, a program doing many memory allocations from multiple threads can experience more than 100x performance degradation.

I'm seeing this in my own Rust program, which has 16 sometimes-compute-bound threads on a 12 CPU machine. This works fine on Microsoft Windows and when compiled for Linux. Under Wine, it can be over 100x slower.

Multiple threads are banging on a spinlock at

_InterlockedCompareExchange [Z:\usr\src\packages\BUILD\include\winnt.h:6630] in ntdll spin_lock [Z:\usr\src\packages\BUILD\dlls\ntdll\sync.c:937] in ntdll

This is the spin lock around the queue for pausing threads.

DISCUSSION

This looks like futex congestion. That's a known kind of problem, but doesn't seem to have been seen in Wine before. It may be that some programs just run really slow under Wine, and nobody knows why.

The code in NTDLL is clearly intended to handle the fast case without doing an OS-level lock. Under light load, that's O(1). But under heavy load, with multiple threads pounding on spinlocks, each spinlock waits longer, and there are multiple threads, so it becomes O(N^2). Which means huge slowdowns.

See discussion at https://forum.winehq.org/viewtopic.php?t=37688 which has snippets of the code that's causing the problem. I can copy all that info over here if necessary. Let me know.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

Show replies by date

WineHQ Bugzilla

25 May 25 May

6:05 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

Zeb Figura z.figura12@gmail.com changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com

--- Comment #1 from Zeb Figura z.figura12@gmail.com --- I haven't been able to find a way to avoid the spinlock for RtlWaitOnAddress. That primitive needs to be fair *and* support timeouts *and* as fast as possible, and I haven't been able to find a faster way than user-space spinlocks.

However, we can probably avoid RtlWaitOnAddress for critical sections.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

6:25 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #2 from John Nagle nagle@animats.com --- I don't see an easy way to eliminate that innermost spinlock, either.

It's not a long lock. The slowest thing inside is searching a linear list of waiting threads, which ought to take maybe a microsecond. So it may be that the outer loops are asking for that lock too often.

There are three nested locks here, two futexes and a hard spinlock. The one that looks most suspicious is the middle one at wait_semaphore. That loop repeatedly calls RtlWaitOnAddress, which is probably why control is usually in the innermost spinlock on a debugger break. If wait_semaphore didn't do that, the problem might go away.

The outermost lock has

if (crit->LockCount > 0) break; /* more than one waiter, don't bother spinning */

so it's protected against this kind of problem. If that lock starts to build up a backlog, the thread blocks rather than spinning. But wait_semaphore doesn't have similar protection.

Thanks for looking at this.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

11:54 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #3 from Zeb Figura z.figura12@gmail.com --- (In reply to Zeb Figura from comment #1)

...

However, we can probably avoid RtlWaitOnAddress for critical sections.

This is harder than I thought. I thought it would be relatively easy to implement a lock-free wait queue on the stacks of the waiting threads, but after thinking it through I don't think it's actually possible at all.

In theory we could try supplying hints as to which thread to wake (say, with a bounded ring buffer). The problem with this is that we only have a pointer's worth of space to work with, and we can't allocate more because we underly the heap implementation.

(In reply to John Nagle from comment #2)

...

There are three nested locks here, two futexes and a hard spinlock. The one that looks most suspicious is the middle one at wait_semaphore. That loop repeatedly calls RtlWaitOnAddress, which is probably why control is usually in the innermost spinlock on a debugger break. If wait_semaphore didn't do that, the problem might go away.

I don't think this is suspicious. We use RtlWakeAddressSingle(), and win32 futexes are fair, so in theory a given waiting thread should only actually be woken once. At worst I think it can have one stale buffered wakeup from a previous alert.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

18 Jun 18 Jun

5:29 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

mqudsi mqudsi@neosmart.net changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |mqudsi@neosmart.net

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

30 Jul 30 Jul

7:12 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #4 from John Nagle nagle@animats.com --- Any progress on this?

What usually sets this off is growing an array. That calls realloc, and attempts to grow the buffer in place. If that's not possible due to a lack of free space after the buffer, the realloc code allocates a new buffer, copies the data, and releases the old buffer. The bottleneck seems to be at the point where the old buffer is released. That's somewhat unexpected.

A test case might involve a large number of compute-bound threads, all growing arrays slowly.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

9:07 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #5 from Zeb Figura z.figura12@gmail.com --- (In reply to John Nagle from comment #4)

...

Any progress on this?

No, I don't have any ideas for optimizing critical sections or Win32 futexes.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

9:14 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #6 from John Nagle nagle@animats.com --- Hm. Real Windows does not seem to have this problem. What does Microsoft do?

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

31 Jul 31 Jul

6:15 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #7 from Zeb Figura z.figura12@gmail.com --- (In reply to John Nagle from comment #6)

...

Hm. Real Windows does not seem to have this problem. What does Microsoft do?

I don't know. Unfortunately I don't think it'll be easy to find out; note that because of copyright we can't go poking around in Windows internals.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

9 Sep 9 Sep

6:47 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #8 from John Nagle nagle@animats.com --- Any progress on this?

You're not likely to get as good a bug report from anyone else. If this happens in some game, they won't have source code. I do, and was able to see the problem in GDB. There's probably some "My Game Hangs" bug report that's really this problem, but can't be diagnosed.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

12 Sep 12 Sep

8:32 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #9 from Zeb Figura z.figura12@gmail.com --- (In reply to John Nagle from comment #8)

...

Any progress on this?

You're not likely to get as good a bug report from anyone else. If this happens in some game, they won't have source code. I do, and was able to see the problem in GDB. There's probably some "My Game Hangs" bug report that's really this problem, but can't be diagnosed.

I understand, I'm thankful for providing a clear bug report. The thing is, this bug is not going to be fixed until someone has an idea for how to write a fast user-space wait queue that doesn't allocate and whose "head" can fit in a pointer.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

8:40 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #10 from John Nagle nagle@animats.com --- Right.

You're three deep in futexes at that point, and I think they're interacting badly. I think it's the one in the middle that's part of the problem. But I'm not sure. GDB gives me a snapshot of the state but not the dynamics that got us there.

Maybe if the middle spinlock had a quick quit to a hard block...

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

9:01 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #11 from Zeb Figura z.figura12@gmail.com --- (In reply to John Nagle from comment #10)

...

Maybe if the middle spinlock had a quick quit to a hard block...

Sorry, what do you mean by "the middle spinlock"? I guess you probably mean futex_queue.lock, since it's the only spinlock.

We can't block there, because we have no fast way to wake up a waiter. [And frankly, even if we could block, I worry about the performance implications.]

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

9:49 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #12 from John Nagle nagle@animats.com --- There is an outer level of spinlock.

See summary here: https://forum.winehq.org/viewtopic.php?t=37688

As far as I can tell, this is not fine-grained locking. It's just one big lock for the whole heap. That should be a short spinlock, which, if it times out, causes a kernel-level lock and block.

The bottleneck is that when you realloc a buffer, the old one may have to be recopied, and the heap is locked up during that period.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

10:28 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #13 from Zeb Figura z.figura12@gmail.com --- (In reply to John Nagle from comment #12)

...

There is an outer level of spinlock.

See summary here: https://forum.winehq.org/viewtopic.php?t=37688

Ah. That's not really a spinlock, though. The CS is a hybrid lock, but not a true spinlock, and we don't use the spin count for the heap anyway, so it's not even a hybrid lock.

...

As far as I can tell, this is not fine-grained locking. It's just one big lock for the whole heap.

That's true, if making heap locking more fine-grained would be enough then that's an alternate solution. I don't really have the time/interest to implement that myself, though.

...

That should be a short spinlock, which, if it times out, causes a kernel-level lock and block.

It is. In fact, it's not even the spinlock; we just block immediately. The problem is that in order to block we need to spin-lock the internal wait queue.

There are other ways to do a kernel wait, but they come with a significant performance cost.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

18 Jan 18 Jan

8:27 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #14 from John Nagle nagle@animats.com --- Also reported as a Rend3 bug: https://github.com/BVE-Reborn/rend3/issues/570

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

9:16 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #15 from Zeb Figura z.figura12@gmail.com --- (In reply to John Nagle from comment #14)

...

Also reported as a Rend3 bug: https://github.com/BVE-Reborn/rend3/issues/570

That seems to be a different bug?

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

9:21 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

--- Comment #16 from John Nagle nagle@animats.com --- Wrong bug. Sorry.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

6 Jan 6 Jan

2:27 p.m.

New subject: [Bug 54979] Futexes which protect memory allocation congest, causing 100x performance degradation

https://bugs.winehq.org/show_bug.cgi?id=54979

Jinoh Kang jinoh.kang.kr@gmail.com changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |jinoh.kang.kr@gmail.com

--- Comment #17 from Jinoh Kang jinoh.kang.kr@gmail.com --- It would be helpful to discover the specific allocation pattern we should optimize for.

Also, the heap lock is never held if the LFH path is taken. Apparently we're not doing LFH here, so we might want to figure out what prevents us from doing LFH, and if Windows also has a problem here.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

178

Age (days ago)

770

Last active (days ago)

wine-bugs@winehq.org

18 comments

1 participants

tags (0)

participants (1)

WineHQ Bugzilla