https://bugs.winehq.org/show_bug.cgi?id=48408
Bug ID: 48408 Summary: mixthread monopolizes buffer_list_lock if WaitForSingleObject returns quickly, causing livelock and game freeze Product: Wine Version: 5.0-rc3 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: directx-dsound Assignee: wine-bugs@winehq.org Reporter: florian.will@gmail.com Distribution: ---
Created attachment 66179 --> https://bugs.winehq.org/attachment.cgi?id=66179 C source for reproducing this issue, livelocks on my system
The default WaitForSingleObject implementation is not fast enough to trigger this, but the bug manifests itself when starting wine-staging with WINEESYNC=1. I originally reported this at https://github.com/ValveSoftware/Proton/issues/3387 , but I am now able to reproduce this on wine-staging. The problem is in dsound though.
The dsound mixthread locks the buffer_list_lock, then calls PerformMix(), then releases the lock, then waits for something to happen using WaitForSingleObject, then locks the buffer_list_lock again, in a loop.
This works fine when not using esync (I guess Valve's fsync also triggers the same issue), since the default WaitForSingleObject implementation is slow enough to allow other threads to acquire the buffer_list_lock while the mixthread executes WaitForSingleObject. However, when esync is enabled AND the mixthread has a lot of work to do (because there are many secondary sound buffers), it apparently returns from WaitForSingleObject almost immediately, and very quickly re-acquires the lock after releasing it. The lock implementation does not guarantee fairness, and on my system, this results in the mixthread monopolizing the buffer_list_lock.
If the main thread then wants to add a new secondary sound buffer to the list, it attempts to acquire the buffer_list_lock exclusively, but it can take a very long time to acquire the lock. This leads to a game freeze in Zusi 3 and maybe other games / applications if there are lots of sound effects at the same time.
I have added a Sleep(1) in the mixthread before acquiring the lock, and it fixed the freezes (but it will obviously cause audio stuttering to kick in earlier than without the sleep). Disabling esync also fixes the freeze.
I came up with a simple reproducer, C source attached. Output is like this with esync enabled: [..] Added buffer No. 283, took 0.056744 sec! Added buffer No. 284, took 0.616793 sec! Added buffer No. 285, took 16.775508 sec! Added buffer No. 286, took 11.163868 sec! Added buffer No. 287, took 26.308287 sec!
Obviously on more powerful systems with better single-thread CPU performance, more buffers will be needed before it "almost-livelocks" like this. This output is from my Phenom II X4 955. (If sound starts stuttering instead of the freeze/lock, the bug did not manifest itself.)
With esync disabled (or when using my patched dsound.dll with "Sleep(1)"), it never takes longer than 0.1 sec to add a buffer.
https://bugs.winehq.org/show_bug.cgi?id=48408
Andrew Eikum aeikum@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |aeikum@codeweavers.com
https://bugs.winehq.org/show_bug.cgi?id=48408
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #1 from Zebediah Figura z.figura12@gmail.com --- Created attachment 66211 --> https://bugs.winehq.org/attachment.cgi?id=66211 dsound: Use an SRW lock for buffer_list_lock.
Since the problem is essentially unfairness, and RTL resources are unfair (I wrote some tests for this), I think the best thing to do is change to using SRW locks, which are fair, are already more performant without out-of-tree patches, and are also documented.
The attached patch does this. I get no worse performance with it than with RTL resources, and it seems to be better at avoiding hiccups that appear for whatever reason.
If you can, please test with the affected real-world application.
https://bugs.winehq.org/show_bug.cgi?id=48408
--- Comment #2 from florian.will@gmail.com --- Thank you very much! This patch indeed fixes those long (eternal?) freezes in the real-world application, Zusi 3.
4f0212c4fd (current git master): Total game freeze when too many secondary buffers are added.
With your patch applied, the game continues when too many secondary buffers are added. (Obviously, sound starts stuttering at that point.)
Compared to my Sleep(1) hack, the sound performance is (of course) better when using your patch, i.e. sound starts stuttering a bit later in the test scenario.
However, using my Sleep(1) hack, I guess the game's main thread is able to remove/add a bunch of secondary sound buffers in quick succession while the mixthread Sleep(1)-s. Using the SRW lock, there is now a freeze for ~5-10 seconds during gameplay when the sound-buffer-heavy game objects (long freight trains) are loaded and unloaded IF the mixthread is unable to keep up with its work. It's probably like this: RemoveBuffer releases lock -> mixthread acquires lock and executes 1 PerformMix iteration (which needs maybe >20 milliseconds) -> RemoveBuffer acquires lock, removes another buffer, releases lock -> mixthread acquires lock again and does another >20ms mix round… and so on. With Sleep(1), this is much smoother, as it probably allows the game thread to remove all those buffers within just 1 millisecond instead of n*20+ milliseconds.
Note that I'm not sure about the explanation in the last paragraph, I just observed the ~5-10 seconds freeze that does not occur with Sleep(1). That freeze occurs at the time when trains go out of "sound range" and sound becomes smooth again (because the number of secondary sound buffers shrinks below ~90).
In "normal" gameplay when the mixthread manages to keep up with mixing and therefore regularly waits for more work, everything seems to work fine with SRWLock.
And in any case, your patch is a HUGE improvement over the old RTL-based mechanism so I'd be happy if it was applied.
https://bugs.winehq.org/show_bug.cgi?id=48408
Alistair Leslie-Hughes leslie_alistair@hotmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |patch CC| |leslie_alistair@hotmail.com
https://bugs.winehq.org/show_bug.cgi?id=48408
Alistair Leslie-Hughes leslie_alistair@hotmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |STAGED Ever confirmed|0 |1 Staged patchset| |https://github.com/wine-sta | |ging/wine-staging/tree/mast | |er/patches/dinput-DIPROP_BU | |FFERSIZE
https://bugs.winehq.org/show_bug.cgi?id=48408
Gijs Vermeulen gijsvrm@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Staged patchset|https://github.com/wine-sta |https://github.com/wine-sta |ging/wine-staging/tree/mast |ging/wine-staging/tree/mast |er/patches/dinput-DIPROP_BU |er/patches/dsound-SRW |FFERSIZE |
https://bugs.winehq.org/show_bug.cgi?id=48408
--- Comment #3 from Zebediah Figura z.figura12@gmail.com --- That would make sense, and unfortunately I'm not sure I see an easy solution.
The fact that mixthread holds the lock for so long is not ideal, but probably not easy to correct.
About how many buffers does the application end up creating? For that matter I might like to see a log with WINEDEBUG=+dsound,+timestamp.
https://bugs.winehq.org/show_bug.cgi?id=48408
--- Comment #4 from florian.will@gmail.com --- Created attachment 66231 --> https://bugs.winehq.org/attachment.cgi?id=66231 grepped dsound trace log with freezes at 48531-48541 and 48578-48591
I agree it's difficult to come up with a solution that doesn't sacrifice mixing performance. In a related issue, #30639, Erich E. Hoover shared his plans in comment #44 to move the actual mixing code (or parts of it) out of the buffer_list_lock. I have looked at it and can't find a way to do this without copying the secondary buffers to a new memory location first, because the buffers can be freed / realloced while working on them if the lock is not held.
I use a special Zusi 3 "six long freight trains in one place" test scenario. It was created to hunt down performance issues on Windows related to that situation, so I guess there might be a similar issue even on Windows (although current consensus appears to be that it's related to freight train animations, but sound has been mentioned in that context). While that test scenario is artificial, similar situations exist in "real" Zusi 3 scenarios at major freight yards and the freezes and/or sound stuttering in wine are quite common there in my experience. (mixer performance seems to be better in Windows for me, I have not encountered the sound stuttering there. I will test the test scenario using Windows when I find some time to boot it, to see if it causes stuttering and/or freezes.)
That said, in the test scenario, I get up to 381 secondary sound buffers. I have a full dsound trace log while using your SRW patch, but it's 340 MB. I guess the interesting part is grep "AddBuffer|RemoveBuffer", so I'm attaching that. Let me know if you want more log parts or the full log. Major freezes happen between timestamps 48531-48541 (first freeze) and 48578-48591 (second freeze).
The logs show that RemoveBuffer sometimes needs to wait 60ms or more to acquire the lock (first line like "(019A17A0, 18FBDBD0)" is start of function execution, second line with buffer count is end of function). With the Sleep(1)s added, it manages to do 5-10 AddBuffer calls before another PerformMix() round starts in the mixthread and the >60ms delay kicks in.
Note that I'm always using Proton 4.11-11 for these tests in the real game (not sure how to convince the Steam DRM to launch the game in upstream wine), and not using a clean wine prefix (the game requires some winetricks). I just copy over my dllsound.dll.so and .fake to Protons dist/lib/wine dir and remove Proton's dsound.dll.
I'd say your patch fixes this bug report. Game freezes that last forever (or many minutes) are replaced with relatively short freezes. If a fix for #30639 can be found, like the "fast mixer" patch from wine-staging, experience should be quite good in most scenarios, and maybe similar to Windows. According to my reproducer, my 10 year old CPU (Phenom II X4 955) can handle ~250 secondary buffers using the fast wine-staging mixer. So the possible short freezes would be limited to areas with lots of freight trains then. Using the current "not so fast, but higher quality" mixer, which only supports up to ~90 sound buffers for me, sometimes 1 freight train is enough to cause some sound issues and lags.
https://bugs.winehq.org/show_bug.cgi?id=48408
--- Comment #5 from florian.will@gmail.com --- Created attachment 66232 --> https://bugs.winehq.org/attachment.cgi?id=66232 grepped dsound trace log with Sleep(1) in mixthread
I also added Sleep(1) after mixthread's lock release and ran the same test scenario, see this log. There is now only a relatively short freeze (like 1 second) at ~49310 and ~49341 each.
https://bugs.winehq.org/show_bug.cgi?id=48408
Alistair Leslie-Hughes leslie_alistair@hotmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|STAGED |RESOLVED Fixed by SHA1| |e09190399904c6ae13ce256aaa2 | |efbf9360ac8c9
--- Comment #6 from Alistair Leslie-Hughes leslie_alistair@hotmail.com --- https://source.winehq.org/git/wine.git/?a=commit;h=13087008437d80542123fc308...
https://source.winehq.org/git/wine.git/?a=commit;h=e09190399904c6ae13ce256aa...
https://bugs.winehq.org/show_bug.cgi?id=48408
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #7 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 5.2.