[PATCH v7 0/3] MR10232: ntdll: Avoid exposing non-user rsp in normal syscall return path
When rsp is outside the kernel stack, the SIGUSR1 handler considers the context "out of syscall" and reports the real register values to wineserver. rsp pointing into the syscall frame is used for iret returns. Commit 0a5f7a710365 moved this earlier in the return path, creating a window in the regular direct return path where rsp points neither to the kernel stack, nor to the expected user stack. When a thread is interrupted by SIGUSR1 within this window, this exposes a bogus stack pointer to GetThreadContext(), breaking the garbage collector in Unity applications. Fix the regression by first switching to the real user stack (needed so we don't break things by touching other registers), then, only if an iret path is coming up, switching to the syscall frame pointer. This fixes the regression introduced by 0a5f7a710365, since GetThreadContext() will now observe the proper stack pointer during regular syscall returns. Further work would be needed to solve other corner cases and ensure that the register context returned by GetThreadContext() behaves exactly as on Windows (with no kernel-mode context info ever returned), but that is a larger/more invasive change. Fixes: 0a5f7a710365 ("ntdll: Switch to the user stack before restoring the %fs register.") Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=59333 Signed-off-by: Hoshino Lina lina@lina.yt -- v7: gitlab: Run ntdll tests with win64 architecture. ntdll/tests: Add a test for bogus SP values from GetThreadContext(). https://gitlab.winehq.org/wine/wine/-/merge_requests/10232
From: Hoshino Lina <lina@lina.yt> When rsp is outside the kernel stack, the SIGUSR1 handler considers the context "out of syscall" and reports the real register values to wineserver. rsp pointing into the syscall frame is used for iret returns. Commit 0a5f7a710365 moved this earlier in the return path, creating a window in the regular direct return path where rsp points neither to the kernel stack, nor to the expected user stack. When a thread is interrupted by SIGUSR1 within this window, this exposes a bogus stack pointer to GetThreadContext(), breaking the garbage collector in Unity applications. Fix the regression by first switching to the real user stack (needed so we don't break things by touching other registers), then, only if an iret path is coming up, switching to the syscall frame pointer. This fixes the regression introduced by 0a5f7a710365, since GetThreadContext() will now observe the proper stack pointer during regular syscall returns. Further work would be needed to solve other corner cases and ensure that the register context returned by GetThreadContext() behaves exactly as on Windows (with no kernel-mode context info ever returned), but that is a larger/more invasive change. Fixes: 0a5f7a710365 ("ntdll: Switch to the user stack before restoring the %fs register.") Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=59333 Signed-off-by: Hoshino Lina <lina@lina.yt> --- dlls/ntdll/unix/signal_x86_64.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 0c62e8a1ad4..db0a3972ef9 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -3172,7 +3172,18 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, /* push rbp-based kernel stack cfi */ __ASM_CFI(".cfi_remember_state\n\t") __ASM_CFI_CFA_IS_AT2(rcx, 0xa8, 0x01) /* frame->syscall_cfa */ - "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ + + /* switch to user stack */ + /* %rsp outside kernel stack means no longer inside syscall */ + "movq 0x88(%rcx),%rsp\n\t" + + "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ + "testl $0x1,%edx\n\t" /* CONTEXT_CONTROL | CONTEXT_INTEGER */ + "jz 1f\n\t" + /* Set up rsp for iretq return paths below */ + "leaq 0x70(%rcx),%rsp\n\t" + "1:\n\t" + #ifdef __linux__ "movw 0x338(%r13),%dx\n" /* amd64_thread_data()->fs */ "testw %dx,%dx\n\t" -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/10232
From: Hoshino Lina <lina@lina.yt> --- dlls/ntdll/tests/Makefile.in | 1 + dlls/ntdll/tests/threadctx.c | 148 +++++++++++++++++++++++++++++++++++ 2 files changed, 149 insertions(+) create mode 100644 dlls/ntdll/tests/threadctx.c diff --git a/dlls/ntdll/tests/Makefile.in b/dlls/ntdll/tests/Makefile.in index 42d663777ea..177b9c7e35c 100644 --- a/dlls/ntdll/tests/Makefile.in +++ b/dlls/ntdll/tests/Makefile.in @@ -25,6 +25,7 @@ SOURCES = \ testdll.c \ testdll.spec \ thread.c \ + threadctx.c \ threadpool.c \ time.c \ unwind.c \ diff --git a/dlls/ntdll/tests/threadctx.c b/dlls/ntdll/tests/threadctx.c new file mode 100644 index 00000000000..75c13e683e4 --- /dev/null +++ b/dlls/ntdll/tests/threadctx.c @@ -0,0 +1,148 @@ +/* + * Unit test suite for ntdll thread context behavior + * + * Copyright 2026 Hoshino Lina + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + * + */ + +#include <stdarg.h> +#include <stdbool.h> + +#include "ntstatus.h" +#define WIN32_NO_STATUS +#include "windef.h" +#include "winbase.h" +#include "winternl.h" +#include "wine/test.h" + +// 110 second max runtime, to avoid winetest timeouts +#define MAX_RUNTIME 110 +#define THREADS 50 +#define STACK_SIZE 0x10000 +#define LOOPS 2000 + +#ifdef __x86_64__ +#define CTX_SP Rsp +#define CTX_IP Rip +#endif +#ifdef __i386__ +#define CTX_SP Esp +#define CTX_IP Eip +#endif +#ifdef __arm__ +#define CTX_SP Sp +#define CTX_IP Pc +#endif +#ifdef __aarch64__ +#define CTX_SP Sp +#define CTX_IP Pc +#endif + +static volatile bool looping = true; + +struct thread { + DWORD tid; + bool stopped; + HANDLE hnd; + CONTEXT ctx; + DWORD64 stack_lo; + DWORD64 stack_hi; +}; + +struct thread t[THREADS]; + +DWORD WINAPI thread_func(LPVOID lpParameter) +{ + struct thread *self = lpParameter; + + ULONG_PTR lo, hi; + GetCurrentThreadStackLimits(&lo, &hi); + + self->stack_lo = (DWORD64)lo; + self->stack_hi = (DWORD64)hi; + + while (looping) + Sleep(0); + + return 0; +} + +static void test_context_sp(void) +{ + int loop; + bool failed = false; + DWORD timeout = GetTickCount() + MAX_RUNTIME * 1000; + + for (int i = 0; i < THREADS; i++) { + t[i].hnd = CreateThread(0, STACK_SIZE, thread_func, (LPVOID)&t[i], 0, + &t[i].tid); + ok(!!t[i].hnd, "Failed to create thread"); + if (!t[i].hnd) { + looping = false; + failed = true; + break; + } + } + + Sleep(100); + + trace("Starting %d loops of thread context fetching", LOOPS); + + for (loop = 0; !failed && loop < LOOPS && GetTickCount() < timeout; loop++) { + for (int i = 0; i < THREADS; i++) { + if (SuspendThread(t[i].hnd) != (DWORD)-1) { + t[i].ctx.ContextFlags = CONTEXT_INTEGER | CONTEXT_CONTROL; + + if (GetThreadContext(t[i].hnd, &t[i].ctx)) { + bool in_range = t[i].ctx.CTX_SP > t[i].stack_lo && + t[i].ctx.CTX_SP <= t[i].stack_hi; + ok(in_range, + "[%d/%d] SP=0x%llx [%llx..%llx] " + "RIP=0x%llx", + loop, LOOPS, (long long)t[i].ctx.CTX_SP, + (long long)t[i].stack_lo, (long long)t[i].stack_hi, + (long long)t[i].ctx.CTX_IP); + + if (!in_range) + failed = true; + + t[i].stopped = true; + } else { + ResumeThread(t[i].hnd); + } + } + } + + for (int i = 0; i < THREADS; i++) { + if (t[i].stopped) + ResumeThread(t[i].hnd); + + t[i].stopped = false; + } + } + + trace("Completed %d/%d loops", loop, LOOPS); + + looping = false; + + for (int i = 0; i < THREADS; i++) + WaitForSingleObject(t[i].hnd, INFINITE); +} + +START_TEST(threadctx) { + test_context_sp(); +} -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/10232
From: Hoshino Lina <lina@lina.yt> --- tools/gitlab/test.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/gitlab/test.yml b/tools/gitlab/test.yml index db77ad5111d..70d5e0afbbb 100644 --- a/tools/gitlab/test.yml +++ b/tools/gitlab/test.yml @@ -62,7 +62,7 @@ test-linux-64: extends: .wine-test variables: - INCLUDE_TESTS: "dinput ntoskrnl.exe" + INCLUDE_TESTS: "dinput ntdll ntoskrnl.exe" rules: - if: $CI_PIPELINE_SOURCE == 'merge_request_event' needs: -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/10232
Added a test. The change affects x86_64, but since the gitlab test suite largely runs on i386, it's not very useful. I added at least `ntdll` to the 64bit test runner, investigating the `ntdll:exception` failure now. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131059
Etaash Mathamsetty (@etaash.mathamsetty) commented about dlls/ntdll/unix/signal_x86_64.c:
/* push rbp-based kernel stack cfi */ __ASM_CFI(".cfi_remember_state\n\t") __ASM_CFI_CFA_IS_AT2(rcx, 0xa8, 0x01) /* frame->syscall_cfa */ - "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ + + /* switch to user stack */ + /* %rsp outside kernel stack means no longer inside syscall */ + "movq 0x88(%rcx),%rsp\n\t" + + "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ + "testl $0x1,%edx\n\t" /* CONTEXT_CONTROL | CONTEXT_INTEGER */
`CONTEXT_CONTROL | CONTEXT_INTEGER` is 0x3 not 0x1 -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131349
Marking as draft since I still have work to do here, and other folks have identified rarer cases where the bogus SP can still be observed. I appreciate any comments on the issue/approach though! -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131531
Here was my approach at trying to solve this, I'm not sure if it works or not but it should keep the syscall handler working exactly the same except for doing the mov first and then doing the lea for when it needs to use the out of frame rsp in those iretq paths ``` diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 5693d6c29dc..09685c4b71d 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -3420,7 +3420,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x28(%rcx),%rdi\n\t" "movq 0x20(%rcx),%rsi\n\t" "movq 0x08(%rcx),%rbx\n\t" - "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ + /* switch to user stack */ + "movq 0x88(%rcx),%rsp\n\t" #ifdef __linux__ "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ "jz 1f\n\t" @@ -3433,8 +3434,6 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "3:\ttestl $0x3,%edx\n\t" /* CONTEXT_CONTROL | CONTEXT_INTEGER */ "jnz 1f\n\t" - /* switch to user stack */ - "movq 0x88(%rcx),%rsp\n\t" /* push rcx-based kernel stack cfi */ __ASM_CFI(".cfi_remember_state\n\t") __ASM_CFI(".cfi_def_cfa %rsp, 0\n\t") @@ -3459,7 +3458,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, /* pop rcx-based kernel stack cfi */ __ASM_CFI(".cfi_restore_state\n") - "1:\ttestl $0x2,%edx\n\t" /* CONTEXT_INTEGER */ + "1:\tleaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ + "testl $0x2,%edx\n\t" /* CONTEXT_INTEGER */ "jnz 1f\n\t" /* CONTEXT_CONTROL */ "movq (%rsp),%rcx\n\t" /* frame->rip */ @@ -3487,6 +3487,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq (%r10),%r10\n\t" "test %r10,%r10\n\t" "jz 3b\n\t" + "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ "testl $0x2,%edx\n\t" /* CONTEXT_INTEGER */ "jnz 1b\n\t" "xchgq %r10,(%rsp)\n\t" ``` -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131564
Probably pure rsp-based detection can't be fully correct for any place of interruption in syscall dispatcher. Note that, to be fully correct, we also should not leak Rip inside syscall dispatcher and should only ever report the one at syscall return. Maybe this needs a more sophisticated 'is_inside_syscall()' check which will also mind 'Rip' position inside syscall / unixcall dispatchers. But that is probably not entirely trivial to do right and nice (not like I have a ready suggestion for details)> -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131631
Maybe this needs a more sophisticated 'is_inside_syscall()' check which will also mind 'Rip' position inside syscall / unixcall dispatchers. But that is probably not entirely trivial to do right and nice (not like I have a ready suggestion for details)\>
This isn't really enough, because we don't know whether the %rsp is correct yet. The problem gets worse if you care about the other registers. I think the only correct solution here is to effectively mask off signals until we're ready to process them. By this I don't mean literally masking the signals (this would be too expensive), but rather having the signal handler defer responding. This ends up being related to some other bugs, namely 54807, and I think Billy Laws also came up with a similar bug that was never formally filed. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131632
This isn't really enough, because we don't know whether the %rsp is correct yet. The problem gets worse if you care about the other registers.
Technically, if you know the Rip and syscall dispatcher code, you can know where Rsp is correct or how to get it correct. But that needs some thought on how to do that nicely without deep and unobvious cross-dependence between syscall dispatcher code and the place in signal handler analyzing that. Yes, the correct solution should care about all of the registers (but probably that would happen automatically once you properly detect that you haven't left syscall dispatcher yet and correctly establish syscall frame). -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131633
This isn't really enough, because we don't know whether the %rsp is correct yet. The problem gets worse if you care about the other registers.
Technically, if you know the Rip and syscall dispatcher code, you can know where Rsp is correct or how to get it correct. But that needs some thought on how to do that nicely without deep and unobvious cross-dependence between syscall dispatcher code and the place in signal handler analyzing that. Yes, the correct solution should care about all of the registers (but probably that would happen automatically once you properly detect that you haven't left syscall dispatcher yet and correctly establish syscall frame).
Right. But there's enough shuffling done in the syscall frame that this kind of solution feels really ugly, plus I think my proposed alternative would address both this and other problems while being cleaner anyway. I don't really know though. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131634
That maybe depends on how that delayed signal handling would look like, but it seems like that is likely to hit the same problem to some extent and is still to be addressed. Delayed signal handling should happen somewhere before syscall exit (apparently, yet during inside the syscall by all means, with syscall stack, fs, whatever, so it can be used for all the potentially delayed tasks which need to happen on syscall exit). And then, it will be going again through the same syscall dispatcher exit and the signal may happen at that time and will need to deal with the current state and position in syscall exit. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131651
Unless maybe that on-exit handler will perform syscall exit itself without ever returning to syscall dispatcher, but there are various quirks to take care of. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10232#note_131652
participants (5)
-
Elizabeth Figura (@zfigura) -
Etaash Mathamsetty (@etaash.mathamsetty) -
Hoshino Lina -
Hoshino Lina (@hoshinolina) -
Paul Gofman (@gofman)