https://bugs.winehq.org/show_bug.cgi?id=50680
Bug ID: 50680 Summary: Detroit: Become Human crashes at launch with ntdll-NtAlertThreadByThreadId Product: Wine-staging Version: 6.1 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: -unknown Assignee: wine-bugs@winehq.org Reporter: lvb.crd@protonmail.com CC: leslie_alistair@hotmail.com, z.figura12@gmail.com Distribution: ---
Created attachment 69406 --> https://bugs.winehq.org/attachment.cgi?id=69406 wine-6.2 with ntdll-NtAlertThreadByThreadId +seh,+pid,+timestamp,+tid output
The game won't launch on Wine-Staging 6.1 and 6.2.
I've done some tests and found that using Wine + `ntdll-NtAlertThreadByThreadId` from Staging is enough to reproduce this problem. Disabling this patchset on Staging or using "vanilla" Wine helps.
And the most annoying thing is that "slowing down Wine" (or whatever) with `WINEDEBUG="+seh,+pid,+sync,+timestamp` - fixes the problem on my local system/hardware. So I can't reproduce it at collection of logs.
I was able to attach a log with these debug channels (+seh,+pid,+timestamp,+tid), but it looks like the game can create several kinds of error. Perhaps it depends on which part of the code/thread is the first to be executed (it looks like something related to parallel programming, I don't know how it is called correctly).
(Sorry, it looks like this time I was not able to collect any useful information to create a report, but I believe that it would be wrong not to create it at all.)
https://bugs.winehq.org/show_bug.cgi?id=50680
Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.winehq.org/sho | |w_bug.cgi?id=41102
https://bugs.winehq.org/show_bug.cgi?id=50680
Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.winehq.org/sho | |w_bug.cgi?id=50448
https://bugs.winehq.org/show_bug.cgi?id=50680
Olivier F. R. Dierick o.dierick@piezo-forte.be changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |o.dierick@piezo-forte.be
--- Comment #1 from Olivier F. R. Dierick o.dierick@piezo-forte.be --- Hello,
Seems related to bug 50605 and bug 50614.
Regards.
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #2 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- Created attachment 69412 --> https://bugs.winehq.org/attachment.cgi?id=69412 wine-staging-6.2 (Fedora 33) terminal output
(In reply to Olivier F. R. Dierick from comment #1)
Seems related to bug 50605 and bug 50614.
Thanks for sharing this information
I did quick installs of Mint 20 and Fedora 33 (same hardware, replaced only the physical HDD with a similar one) and got identical behavior of the issue with the ready-made wine-staging packages (6.2 at the moment) from the WineHQ repository. wine-devel (6.2) also successfully launches this game (only tested up to the beginning of the first level). Some info about the hardware: -Intel i4790k (4c/8t) (without OC) (2x8GB DDR3) -Proprietary Nvidia driver (GTX 1080Ti)
https://bugs.winehq.org/show_bug.cgi?id=50680
Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.winehq.org/sho | |w_bug.cgi?id=49712
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #3 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- Since this is a set of patches, I did some bissecting tests (not sure how correct it is) and found that preventing the applying https://github.com/wine-staging/wine-staging/blob/9aeea5d12e008266c0aff42c48... (and the following 0013, because it can't apply properly) - helps.
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #4 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- (In reply to Dmitry Skvortsov (Iglu47) from comment #3)
Since this is a set of patches, I did some bissecting tests (not sure how correct it is) and found that preventing the applying https://github.com/wine-staging/wine-staging/blob/ 9aeea5d12e008266c0aff42c48e13d5545956478/patches/ntdll- NtAlertThreadByThreadId/0012-ntdll-Get-rid-of-the-direct-futex-path-path- dition (and the following 0013, because it can't apply properly) - helps.
sorry, problem with copying link https://github.com/wine-staging/wine-staging/blob/9aeea5d12e008266c0aff42c48...
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #5 from Zebediah Figura z.figura12@gmail.com --- 3724.381:0020:0024:err:sync:RtlLeaveCriticalSection section 00000000859D5BEE is not acquired 3724.418:0020:0024:err:sync:RtlLeaveCriticalSection section 00000000859D5BBE is not acquired 3724.444:0020:0024:err:sync:RtlLeaveCriticalSection section 00000000859D5B8E is not acquired
Is this line (repeated or not) always present?
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #6 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- Created attachment 69494 --> https://bugs.winehq.org/attachment.cgi?id=69494 wine-6.3 terminal output (three times)
(In reply to Zebediah Figura from comment #5)
3724.381:0020:0024:err:sync:RtlLeaveCriticalSection section 00000000859D5BEE is not acquired 3724.418:0020:0024:err:sync:RtlLeaveCriticalSection section 00000000859D5BBE is not acquired 3724.444:0020:0024:err:sync:RtlLeaveCriticalSection section 00000000859D5B8E is not acquired
Is this line (repeated or not) always present?
Yes. I am attaching the terminal output (WINEDEBUG="+timestamp") of three launches of the game on wine-6.3 (not Staging). The game loads successfully and continues to run successfully (I tried the main menu and the first few minutes of gameplay).
https://bugs.winehq.org/show_bug.cgi?id=50680
Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #69494|wine-6.3 terminal output |wine-6.3 terminal output description|(three times) |(four times)
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #7 from Zebediah Figura z.figura12@gmail.com --- Created attachment 69495 --> https://bugs.winehq.org/attachment.cgi?id=69495 forcibly align condition variables...
Not particularly optimistic, but does the attached patch help?
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #8 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- Created attachment 69500 --> https://bugs.winehq.org/attachment.cgi?id=69500 wine-6.3 with ntdll-NtAlertThreadByThreadId and attachment 69495 output (two times)
(In reply to Zebediah Figura from comment #7)
Not particularly optimistic, but does the attached patch help?
It looks like with this patch the application window now always does not appear before the error and `WINEDEBUG="+seh,+pid,+sync,+timestamp` now does not fix the problem (I can post it later if necessary).
https://bugs.winehq.org/show_bug.cgi?id=50680
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #69495|0 |1 is obsolete| |
--- Comment #9 from Zebediah Figura z.figura12@gmail.com --- Created attachment 69519 --> https://bugs.winehq.org/attachment.cgi?id=69519 forcibly align condition variables... (take 2)
Sorry, there was an error in that diff. Can you please try this one instead?
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #10 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- Created attachment 69520 --> https://bugs.winehq.org/attachment.cgi?id=69520 wine-6.3 with ntdll-NtAlertThreadByThreadId and attachment 69519 output
(In reply to Zebediah Figura from comment #9)
Created attachment 69519 [details] forcibly align condition variables... (take 2)
I did only a few tests, it looks like with this version of the patch the behavior repeats (or very closely) the initially described behavior. I attach terminal output with WINEDEBUG="+seh,+pid,warn+sync,+timestamp".
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #11 from Zebediah Figura z.figura12@gmail.com --- Created attachment 69521 --> https://bugs.winehq.org/attachment.cgi?id=69521 another guess
Okay, can you please try the attached patch instead? No need to attach a new log.
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #12 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- Sorry for the late response.
(In reply to Zebediah Figura from comment #11)
Created attachment 69521 [details]
Did not help (silent termination without any messages even in the terminal before the window appears, "endless" loading anamation, freezing on loading animation and crash with backtrace). Enabling +sync still helps as previously.
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #13 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- I haven't done as many detailed tests as in past times, but it looks like https://github.com/wine-staging/wine-staging/commit/a1a2d654886fe71af49f9f64210e21d743976ffe and https://github.com/wine-staging/wine-staging/commit/4eaa5b69b894ac13ffc7b38ef738c02b800728d1 don't help me.
also I only have the Epic Games Store version of the game, so I don't know if there is such a problem in the Steam version of the game.
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #14 from Zebediah Figura z.figura12@gmail.com --- Created attachment 70877 --> https://bugs.winehq.org/attachment.cgi?id=70877 guess #3
It occurs to me that there's something else I haven't tried, namely, applying patch 10 by itself. Can you please test this diff (which is just a rebase of patch 10) on top of a plain Wine tree?
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #15 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- (In reply to Zebediah Figura from comment #14)
Created attachment 70877 [details] guess #3
It occurs to me that there's something else I haven't tried, namely, applying patch 10 by itself. Can you please test this diff (which is just a rebase of patch 10) on top of a plain Wine tree?
The game does not start. It seems that there are several results and all are unsuccessful: the appearance of a wine window indicating that an error has occurred, or a silent program termination, but, unlike previous tests, in these tests everything ends almost always before the game window appears.
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #16 from Zebediah Figura z.figura12@gmail.com --- Created attachment 70925 --> https://bugs.winehq.org/attachment.cgi?id=70925 try always waking INT_MAX?
I'm really guessing here, but what happens if this patch is applied on top of upstream wine?
https://bugs.winehq.org/show_bug.cgi?id=50680
soredake gi85qht0z@relay.firefox.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |gi85qht0z@relay.firefox.com
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #17 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- (In reply to Zebediah Figura from comment #16)
Created attachment 70925 [details] try always waking INT_MAX?
I'm really guessing here, but what happens if this patch is applied on top of upstream wine?
I tried on Wine from 29.oct.2021 (5f93c683ab0163cb34482fe18549cf249b8b074b) - I started the game several times and everything was successful, probably the performance is ok (for the option without Esync).
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #18 from Zebediah Figura z.figura12@gmail.com --- (In reply to Dmitry Skvortsov (Iglu47) from comment #17)
(In reply to Zebediah Figura from comment #16)
Created attachment 70925 [details] try always waking INT_MAX?
I'm really guessing here, but what happens if this patch is applied on top of upstream wine?
I tried on Wine from 29.oct.2021 (5f93c683ab0163cb34482fe18549cf249b8b074b) - I started the game several times and everything was successful, probably the performance is ok (for the option without Esync).
Damn, okay, that mostly rules out one theory...
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #19 from Zebediah Figura z.figura12@gmail.com --- Created attachment 70964 --> https://bugs.winehq.org/attachment.cgi?id=70964 try nuking all of the wait and wake code
How about this one, on top of upstream wine?
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #20 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- (In reply to Zebediah Figura from comment #19)
Created attachment 70964 [details] try nuking all of the wait and wake code
How about this one, on top of upstream wine?
such changes are already too many - the game does not start, even the game window/fullscreen does not appear, but crashes.
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #21 from Zebediah Figura z.figura12@gmail.com --- Created attachment 71014 --> https://bugs.winehq.org/attachment.cgi?id=71014 patch 0010 + increased futex hash list size
So, after off-bug discussion and some manual testing on my end, I have a pretty good theory as to what's going on now.
I'd like to ask for one more test, though. This test does a lot to confirm my theory on *my* machine, but I'd like to get a second opinion.
Please try the attached patch, on top of upstream wine (current git). This is basically a rebased version of patch 0010 by itself, *but* it also increases the addr_wait hash list size. If my theory is right, this will fix the game consistently. It does for me in something like 10/10 runs.
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #22 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- (In reply to Zebediah Figura from comment #21)
Created attachment 71014 [details] patch 0010 + increased futex hash list size
So, after off-bug discussion and some manual testing on my end, I have a pretty good theory as to what's going on now.
I'd like to ask for one more test, though. This test does a lot to confirm my theory on *my* machine, but I'd like to get a second opinion.
Please try the attached patch, on top of upstream wine (current git). This is basically a rebased version of patch 0010 by itself, *but* it also increases the addr_wait hash list size. If my theory is right, this will fix the game consistently. It does for me in something like 10/10 runs.
I tried on 97d6cfd7059fbe55fdd24a04e8d9133848328d4e. yes, it works for my machine too. I did some tests with the game launches with this patch - the game behavior continues to be similar to what is on vannila Wine.
https://bugs.winehq.org/show_bug.cgi?id=50680
--- Comment #23 from Dmitry Skvortsov (Iglu47) lvb.crd@protonmail.com --- I checked on the Wine-Staging with updated patchset (https://github.com/wine-staging/wine-staging/commit/45230b51db703933dc6108c5...). The initial described issue is fully gone. Again, thanks for your hard work.
(maybe there are some problems specific to the EGS version of the game, but it need to be checked additionally and for this will need to create the separate topic)
https://bugs.winehq.org/show_bug.cgi?id=50680
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed by SHA1| |45230b51db703933dc6108c526a | |9eb872416981a Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
--- Comment #24 from Zebediah Figura z.figura12@gmail.com --- Yeah.
Okay, so here is my working theory:
The game misuses condition variables. I've disassembled three or four different call sites and I see code like this:
EnterCriticalSection(&apple->cs); if (apple->x || SleepConditionVariableCS(&apple->cv, &apple->cs, INFINITE) != ERROR_TIMEOUT) --apple->x; LeaveCriticalSection(&apple->cs);
Which is wrong for multiple reasons, but the most important one is that it doesn't call SleepConditionVariableCS() in a loop. Which means that a stolen or spurious wakeup will result in incorrect behaviour.
(Granted, it's possible that I'm wrong, and that the relevant functions are being called in a loop somehow, or that "x" is used in a way that is ultimately safe against spurious wakeups. I don't think this is likely, especially given the other errors made around condition variables [e.g. in one place they call SleepConditionVariableCS() without holding the CS], but it's always possible.)
I suspect that it works consistently on Windows because the Windows implementation of condition variables does not cause spurious wakeups (at least, not if there's only one waiting thread). This fact is hard to corroborate. I ended up writing a test that I think roughly checks whether spurious wakeups matter in the way that Detroit cares about, which I'll attach. I get spurious wakeups every 10 seconds or so on Windows. On the other hand, they show up pretty constantly on Linux, with the tid alert patches.
The reason for the behaviour we've seen, then, is as follows:
(1) With current upstream wine, we use the condition variable word as a futex directly. As far as I can tell, the Linux implementation of futexes doesn't actually cause any spurious wakeups under "normal" circumstances. I'm not sure of this, but I can't find any reason in the code that it does.
(2) With current wine-staging, i.e. with all of the tid alert patches applied, we implement condvars on top of win32 futexes on top of tid alerts (on top of linux futexes, but that ends up not mattering). There are basically three ways we can get a wakeup without going all the way into the scheduler:
(a) Return early from RtlSleepConditionVariable*() because the condition didn't match. This isn't a problem, because we're not queued, and thus a subsequent wake will wake nothing.
(b) Return early from RtlWaitOnAddress() because the address didn't match. This can be a problem, because we *are* queued, and if a subsequent wake happens before we unqueue ourselves, we get wake type (c), which is...
(c) A "buffered" wake from a previous RtlWaitOnAddress() call. In the above scenario, the waking thread calls NtAlertThreadByThreadId() after the waiter is woken up (either from a type (b) wake or a timeout), but before it unqueues itself. TID alerts are basically just fancy events, so the wake is buffered until the next call to NtWaitForAlertByThreadId(). In theory Windows suffers from this as well. In practice my attempts to reproduce it have been met with failure, and I'm not sure why, although I have at least one guess.
(3) With patch 0010 applied by itself, we don't get the problem from (2). What happens instead is that we get spurious wakes due to hash collision. In order to verify this, I tried increasing the hash table size, to avoid collision, hence the patch from comment 21. And this indeed made the spurious wakes go away, and let the game start working.
Because we currently use spinlocks in the PE side anyway, the easiest way to avoid buffered wakes is actually to do the comparison inside the spinlock (much like we currently do upstream on the Unix side when futexes aren't available). This is what was committed in wine-staging as 45230b51db703933dc6108c526a9eb872416981a, and it seems to consistently fix this bug.
https://bugs.winehq.org/show_bug.cgi?id=50680
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #25 from Zebediah Figura z.figura12@gmail.com --- Closing bugs fixed in wine-staging 6.22.