https://bugs.winehq.org/show_bug.cgi?id=53314
Bug ID: 53314 Summary: ptrace attach timeout freezes processes to stopped state Product: Wine Version: 7.12 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: wineserver Assignee: wine-bugs@winehq.org Reporter: superemppu+winehq@live.fi Distribution: ---
Created attachment 72691 --> https://bugs.winehq.org/attachment.cgi?id=72691 patch remove timeout
When wineserver wants to attach ptrace to a process, ptrace() is first called with PTRACE_ATTACH and then wineserver waits on waitpid() until the process has received a SIGSTOP. This is correct behavior and described on ptrace manpage. https://linux.die.net/man/2/ptrace
However, there is a timeout: If the SIGSTOP hasn't reached the process in 3 seconds, wineserver will give up on the PTRACE_ATTACH and immediately do a PTRACE_DETACH. According to the manpage, PTRACE_DETACH takes effect only when the process is already stopped, so that PTRACE_DETACH does nothing. When eventually the SIGSTOP reaches the process, it will be stopped forever and wineserver thinks that the ptrace session is already over.
This bug only occurs when receiving a SIGSTOP takes over 3 seconds for a process. On most setups that never happens, but on slow running fuse filesystems the kernel may be in uninterruptible disk sleep for over 3 seconds. In our setup this happens consistently when the process runs on fuse-overlayfs on top of a slow NFS, and it probably happens on sshfs too. As a result, games such as Subnautica and Arma Cold War Assault freeze to stopped state.
How can we fix this?
Method 1: Just remove the timeout entirely (patch attached). The only case where that timeout matters is if the kernel is doing something important, in which case we should probably wait, or a kernel thread is stuck, in which case we have bigger problems. The fact that this bug has been in Wine for 20 years and not reported proves that the timeout is not triggered much.
Disadvantages: If a kernel thread of one wine process is stuck, wineserver will also be stuck for other wine processes even if their kernel threads are fine.
Method 2: If we want to keep the timeout, we should not do PTRACE_DETACH immediately, but rather set a flag that PTRACE_ATTACH failed. When the SIGSTOP eventually comes, a SIGCHLD will be sent to wineserver and we can do the PTRACE_DETACH in the SIGCHLD handler if the flag was set.
Disadvantages: We need an arbitrary timeout. 3 seconds is not enough for slow fuse filesystems. If the timeout is triggered, processes may still crash because they could not use whatever windows API wineserver was emulating with ptrace.