https://bugs.winehq.org/show_bug.cgi?id=53372
Bug ID: 53372 Summary: Total War Shogun 2 spews RtlLeaveCriticalSection() section is not acquired errors in 3D scenes. Product: Wine Version: 7.13 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: directx-d3d Assignee: wine-bugs@winehq.org Reporter: hibbsncc1701@gmail.com Distribution: ---
Created attachment 72757 --> https://bugs.winehq.org/attachment.cgi?id=72757 wine-7.13 console log
As of wine-7.13, Total War Shogun 2 has started spewing section <address> is not acquired errors in the console while the game is rendering 3D scenes. These errors were not present in wine-7.12.
These errors can also cause the game to crash after a battle during the return to the campaign map with GL out of memory errors. Although it doesn't happen every time like in previous bugs.
Regression testing reveals 66f37aae7e25740857aeb8ea9c2f0395781389f5 as the first bad commit. I've tried reverting this commit in current git and the reversion fixes the bug and the crashing it causes.
https://bugs.winehq.org/show_bug.cgi?id=53372
Patrick Hibbs hibbsncc1701@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Regression SHA1| |66f37aae7e25740857aeb8ea9c2 | |f0395781389f5
https://bugs.winehq.org/show_bug.cgi?id=53372
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |regression
https://bugs.winehq.org/show_bug.cgi?id=53372
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #1 from Zeb Figura z.figura12@gmail.com --- Thanks for the report, I've submitted https://gitlab.winehq.org/wine/wine/-/merge_requests/483/diffs.
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #2 from Patrick Hibbs hibbsncc1701@gmail.com --- Created attachment 72774 --> https://bugs.winehq.org/attachment.cgi?id=72774 MA applied OOM No crash console log
I applied the MA locally and retested it. Although the critical section errors no longer occur in the logs, the game is still unstable.
With the MA applied, the game won't even start occasionally. Deadlocking around D3D device init before the application window is generated.
When the game does start, it can occasionally crash at random (OOM error) where it didn't before. (Title screen, Campaign map before a battle, During a battle)
Additionally, I've had four test attempts that generated a GL OOM error during the return to the campaign map (a common point of failure) but did *not* cause a crash. Instead the game successfully reached the campaign map. Seemingly without issue (no UI / graphical glitches) and the current session could even be saved normally. The only indication of a problem was the console spewing GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT errors on every frame. (I've attached the complete console log from one of these four attempts.)
I've tried to get a d3d log for the non-fatal OOM error, but the game is very slow while doing so and I've yet to succeed. The last attempt ended with a crash during a battle and wound up being 311.7MB compressed / 30.3GB uncompressed. I could upload it if desired.
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #3 from Zeb Figura z.figura12@gmail.com --- I can't reproduce the OOM errors. I was at least able to reproduce a different error also present in the log, for which I've sent https://gitlab.winehq.org/wine/wine/-/merge_requests/495/diffs. That might be related, though; are you able to reproduce the OOM errors with those patches also applied?
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #4 from Patrick Hibbs hibbsncc1701@gmail.com --- Created attachment 72790 --> https://bugs.winehq.org/attachment.cgi?id=72790 MR483+495 Deadlock at start no app window. Console log +d3d
OK, applied both MRs.
For those wondering how these tests are being performed:
I have a save file that's set for an ambush battle upon start, that I always play by knowing the positions of the enemy force. (It never changes.) Two of the player's units are moved to the opposite side of the map, and the remaining units are moved to make them visible to the enemy AI. The player wins the match, chooses to end the battle instead of running down the AI, and then saves the battle replay before returning to the campaign map. Where (if the game is successful in getting there) the game will display the results screen and render a video for first time unit construction. The player then saves the game and exits to the desktop.
This is more or less consistent. With the only real room for error being the placement of the player's units (As their placement can cause the enemy AI to change it's behavior.) and the randomness of units dying and routing off of the map. (Not really fixable.)
The results of testing are below:
3 attempts (No extra debugging, No log redirection) = OOM crash during return to campaign map after battle
3 attempts (No extra debugging, log redirection to file) = deadlock on startup before app window generation
2 attempts (No extra debugging, log redirection to file) = OOM crash during return to campaign map after battle
1 attempt (WINEDEBUG="+d3d", log redirection to file) = deadlock on startup before app window generation
1 attempt (WINEDEBUG="+d3d", log redirection to file) = Non-fatal OOM error after battle successful return to campaign map, successful save of game, and normal app exit
Overall the crashes are more consistent with both MRs on my system. In that the crashes always occur. (With the one exception as noted above.)
I did finally get my d3d log for a non-fatal OOM if it's wanted. Although fair warning, it's 64.9GiB uncompressed / 1,229.2MiB compressed.
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #5 from Zeb Figura z.figura12@gmail.com --- That attached log is very weird, it's hanging (or crashing?) almost as soon as the CS starts, and before anything that any of the linked or blamed patches should modify. You're sure none of this was happening before 66f37aae7e2?
If you can reproduce that early deadlock again, can you please attach a log with WINEDEBUG=+d3d,+d3d9,+opengl,+wgl,+seh?
This is probably something specific to nvidia drivers; unfortunately I don't currently have access to an nvidia machine to test with.
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #6 from Patrick Hibbs hibbsncc1701@gmail.com --- Created attachment 72793 --> https://bugs.winehq.org/attachment.cgi?id=72793 MR483+495 Deadlock with WINEDEBUG=+d3d,+d3d9,+opengl,+wgl,+seh
Here's the log, got it first try.
Yes, the deadlocking in particular is a new behavior that didn't occur with my previous build of Wine. (Based off of commit 6c465ae8efeecc7580629dc8a9c26198b743c2da.)
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #7 from Zeb Figura z.figura12@gmail.com --- Yeah, that's the same thing, it makes no sense. It crashes inside of glXCreateContextAttribsARB(), and there's no mapping or drawing done.
My suspicion is that we're just running out of VA space for some *other* reason. I don't know why 66f37aae7e would consistently break it, though...
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #8 from Patrick Hibbs hibbsncc1701@gmail.com --- Created attachment 72806 --> https://bugs.winehq.org/attachment.cgi?id=72806 wine-7.13 crash on Intel Iris Plus Graphics (Ice Lake) with WINEDEBUG=+d3d,+d3d9,+opengl,+wgl,+seh
For laughs, I tried running the game under vanilla wine-7.13 on a Surface Pro 7. (Intel iGPU Intel Corporation Iris Plus Graphics G4 (Ice Lake) (rev 07). FYI, this gets misdetected by wine due to an unknown device ID. I should probably send a patch for the gpu list...)
The game has never worked for me under wine on this device. It always crashes on the splash screen even before wine-7.13. (I'll open another bug for this one, but I'll attach the log here for completeness sake. It has WINEDEBUG=+d3d,+d3d9,+opengl,+wgl,+seh flags set.)
However, I thought it might give some clues and I think I got a couple. The game crashes due to a EXCEPTION_ACCESS_VIOLATION error on this device, but it also echos something else at the last second: "** Downsizing Textures, Over Budget by : 115 Mb"
I've never seen that message before on my other systems. So I'm not sure why it's showing up here. (Maybe because it's not an nvidia GPU?) For fun, I tried running the game under wine-7.11 and there the message appears right before the D3DXLoadSurfaceFromMemory() unhandled filter fixme spam. Instead of after it as on wine-7.13. So it seems like we have a thread racing / sync issue.
Looking at the actual fixme complaint indicates that the unimplemented filter is D3DX_FILTER_BOX. MSDN seems to indicate that it is meant for down-sampling a texture. (If averaging a group of pixels is anything to go by. Admittedly, 3D work is not my realm of expertise.) Looking at the wine code https://source.winehq.org/source/dlls/d3dx9_36/surface.c#2120 we see that filter is indeed unimplemented. This fixme gets printed a lot even during normal operation of the game. Despite the lack of the "downsizing textures" message on my other systems. So perhaps if the filter was implemented the VA exhaustion issue might get a little better.
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #9 from Zeb Figura z.figura12@gmail.com --- I managed to get access to an NVidia machine and was able to reproduce the same deadlock on startup.
As described above, the changes to d3d don't make a lot of sense, so I tried double-checking, and I think I found a different commit that's to blame. It's hard to be sure, because the deadlock is tetchy and sometimes won't reproduce until the dozenth run, but I think the offending commit is:
commit 18ae96e5fb3cbbd53f1a022ba81203de6b431228 Author: Zhiyi Zhang zzhang@codeweavers.com Date: Mon Apr 25 17:22:16 2022 +0800
winex11.drv: Lock display when expecting error events.
If the display is not locked, another thread could take the error event and handle it with the default error handlers and thus not handled by the current thread with the specified error handlers.
Fix Cladun X2 crash at start.
Signed-off-by: Zhiyi Zhang zzhang@codeweavers.com
More interestingly, if I look at the process state when the game is hung, I notice that, while the main thread is locked at 100% CPU (waiting for the CS thread), the CS thread is sleeping, and another CS thread which was already shut down is also sleeping. Further tracing shows that the "old" CS thread is terminated and not running any more win32 code, but hasn't actually exited. And I'm unable (after a few tries) to reproduce with csmt=0.
My suspicion, although I have no way to verify this, is that the NVidia driver is deadlocking because of a lock ordering problem. I am guessing that it does thread cleanup with pthread_cleanup_push(), and that inside of that it grabs some internal lock (a GLX context lock?) and then calls XLockDisplay(), and that glXCreateContext() grabs the same lock, resulting in a lock inversion when the latter is called while already in XLockDisplay().
If I'm right, I don't know whose bug this really is. XLockDisplay() is part of libx11, not the X11 protocol, and while libx11 is documented the behaviour of threading like this doesn't seem to be specified. If I had to give a reading, though, I'd say that since there's nothing in the documentation preventing us from calling glXCreateContext() with a locked display (and since we have a good reason to do so) this is NVidia's bug.
Patrick, does reverting 18ae96e5fb help? I'd expect it to at least get rid of the deadlock (although it's possible that I haven't sufficiently tested and that my analysis is wrong) but it may not get rid of the OOM errors—those may have a separate cause (e.g. the streaming buffer from 66f37aae7e2 is somehow growing too large, which would make a lot more sense...)
And if not, does turning off CSMT help?
I was able to reproduce some OOM errors, but they went away after applying both of the merge requests I linked earlier. (Which are, by the way, both upstream by now.)
Probably the hang on startup should be split out to a different bug in any case.
https://bugs.winehq.org/show_bug.cgi?id=53372
Julian Rüger jr98@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jr98@gmx.net
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #10 from Patrick Hibbs hibbsncc1701@gmail.com --- Reverting 18ae96e5fb does seem to have fixed the deadlocking. I've yet to encounter it after reverting the patch, and I've started the game around ten times. (As with the previous tests.)
I'll go ahead and split out that issue to a separate bug.
As for the OOM errors, I'm still getting those on current git with 18ae96e5fb reverted. Including the non-fatal OOM error. Should I split those issues into a separate bug given that the CriticalSection error has been fixed upstream?
https://bugs.winehq.org/show_bug.cgi?id=53372
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed by SHA1| |dd5c511c9fe4bef86f46d16aedb | |40ad65f2dd3e0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED
--- Comment #11 from Zeb Figura z.figura12@gmail.com --- (In reply to Patrick Hibbs from comment #10)
As for the OOM errors, I'm still getting those on current git with 18ae96e5fb reverted. Including the non-fatal OOM error. Should I split those issues into a separate bug given that the CriticalSection error has been fixed upstream?
Yes, I think that would be a good idea. Thanks!
https://bugs.winehq.org/show_bug.cgi?id=53372
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #12 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 7.14.
https://bugs.winehq.org/show_bug.cgi?id=53372
--- Comment #13 from Zeb Figura z.figura12@gmail.com --- (In reply to Patrick Hibbs from comment #10)
Reverting 18ae96e5fb does seem to have fixed the deadlocking. I've yet to encounter it after reverting the patch, and I've started the game around ten times. (As with the previous tests.)
I'll go ahead and split out that issue to a separate bug.
For reference, this is bug 53428.