http://bugs.winehq.org/show_bug.cgi?id=59333 Bug ID: 59333 Summary: VNyan leaks memory (race in __wine_syscall_dispatcher $rsp switching breaks Unity GC) Product: Wine Version: 11.0 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: major Priority: P2 Component: ntdll Assignee: wine-bugs@list.winehq.org Reporter: lina@lina.yt Distribution: --- Created attachment 80270 --> http://bugs.winehq.org/attachment.cgi?id=80270 Debug & gdb logs VNyan (https://suvidriel.itch.io/vnyan) under recent versions of Wine starts leaking memory at a random time (from minutes to hours after startup). It took a few days of debugging and reverse engineering, but I tracked this down to an issue with Unity's GC observing a weird stack switch in Wine. For reference, Proton 8-5 seems to be fine and GE Proton 10-15 triggers the problem. The problem reproduces with plain `wine-11.0 (Staging)` (wine-11.0-2.fc43.x86_64 from Fedora 43), which is what I used for the traces in this bug. This is the Win32 threads code in the Unity fork of bdwgc: https://github.com/Unity-Technologies/bdwgc/blob/unity-master/win32_threads.... Normally, bdwgc will use SuspendThread() on all application threads (see GC_suspend()) and then call GetThreadContext() to retrieve the thread context. It then uses the sp value to determine which stack segment to mark as roots for GC. The logic is complex and there is a safety check in case sp is out of bounds (then it collects the whole stack). However, on line 1695 above, sp is blindly subtracted from thread->stack_base and used to compute the stack usage in bytes, without any bounds checks. This means that when sp ends up above the stack, the subtraction overflows and returns a bogus huge stack size. This ends up throwing off the GC collection threshold, and the GC never runs again. During a normal stop-the-world GC ([1] in vnyan_debug.txt), all threads are stopped with $rip in Windows code (0x00006fffffxxxxxx) (other than the main thread which is in 0x000000018xxxxxx because I patched the Mono DLL to not relocate, for consistent debugging). Most threads are in NtWaitForSingleObject or NtWaitForAlertByThreadId, a couple in NtDelayExecution/NtNotifyChangeKey/NtUserMsgWaitForMultipleObjectsEx/NtWaitForMultipleObjects, and two threads in a mmdevapi wine_unix_call(). When the bug occurs, one thread is caught in the middle of UNIX code ([2] in vnyan_debug.txt). The stack pointer changes from this (pre-bug): Thread 82 (Thread 32.0x2c0): $201 = 0x7a8bf3e8 To this (during bug): Thread 82 (Thread 32.0x2c0): $295 = 0x12938eab0 The GC logs this warning (needs patches to enable debug logs): --> Marking for collection #2607 after 3617216 allocated bytes Marked from 150 dirty pages GC Warning: Thread stack pointer 000000012938EAB0 out of range, pushing everything Pushed 34 thread stacks And then the GC stops working. In the particular repro logged, rip is pointing here in __wine_syscall_dispatcher (it's usually within a few instructions of this area): https://github.com/wine-mirror/wine/blob/db11d0fe6a169c457e23d007e20404643d0... This means that is_inside_syscall() returned false and allowed the thread state to be captured directly from native thread state. This is defined as: static inline BOOL is_inside_syscall( ULONG_PTR sp ) { return ((char *)sp >= (char *)ntdll_get_thread_data()->kernel_stack && (char *)sp <= (char *)get_syscall_frame()); } Just a few lines before the instruction the thread was stopped in: "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ Indeed, %rsp as seen by the user app is 0x70 into syscall_frame (which is 0x12938ea40). Later in the function the stack is switched to the proper user stack: /* switch to user stack */ "movq 0x88(%rcx),%rsp\n\t" I believe when a thread is stopped in the entire range between those two instructions, user code can observe %rsp set to a bogus value that should not be possible. This code was moved around by commit 245e8cedf059 and previously introduced by 0a5f7a71036. I'm not sure why %rsp is being set to point into &frame->rip instead of simply restoring it to the user stack pointer earlier. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.