x86_64 Windows and macOS both use `%gs` to access thread-specific data (Windows TEB, macOS TSD). To date, Wine has worked around this conflict by filling the most important TEB fields (`0x30`/`Self`, `0x58`/`ThreadLocalStorage`) in the macOS TSD structure (Apple reserved the fields for our use). This was sufficient for most Windows apps.
CrossOver's Wine had an additional hack to handle `0x60`/`ProcessEnvironmentBlock`, and binary patches for certain CEF binaries which directly accessed `0x8`/`StackBase`. Additionally, Apple's libd3dshared could activate a special mode in Rosetta 2 where code executing in certain regions would use the Windows TEB when accessing `%gs`.
Now that the PE separation is complete, GSBASE can be swapped when entering/exiting PE code. This is done in the syscall dispatcher, unix-call dispatcher, and for user-mode callbacks. GSBASE also needs to be set to the macOS TSD when entering signal handlers (in `init_handler()`), and then restored to the Windows TEB when exiting (in `leave_handler()`). Some changes to the syscall dispatcher were needed to ensure that the TEB is not accessed through `%gs` while on the kernel stack (since a SIGUSR1 while on the kernel stack will result in GSBASE being set to the TSD).
---
I've tested this successfully on macOS 15 (Apple Silicon and Intel) and macOS 10.13 with several apps and games, including the `cefclient.exe` CEF sample.
Encouragingly, in some simple tests I didn't see a noticeable performance regression from this MR.
There are drawbacks though: - libraries which jump directly from PE code into Unix code (expecting that %gs is always pointing to the macOS TSD) will crash. Notable examples are D3DMetal and DXMT. These will need to be changed to use Unix calls. - If Windows code uses the `syscall` instruction directly, the stack pointer likely needs to be valid (which is probably not true on Windows). This is due to the syscall dispatcher saving registers onto the user stack and having to call `_thread_set_tsd_base`. I can't say I've ever seen direct syscalls done with an invalid `%rsp`, but it seems like something anticheat code might do.
---
macOS does not have a public API for setting GSBASE, but the private `_thread_set_tsd_base()` works and was added in macOS 10.12.
`_thread_set_tsd_base()` is a small thunk that sets `%esi`, `%eax`, and does the `syscall`: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b.... The syscall instruction itself clobbers `%rcx` and `%r11`.
I've tried to save as few registers as possible when calling `_thread_set_tsd_base()`, but there may be room for improvement there.
---
I also tested an alternate implementation strategy for this which took advantage of the expanded "full" thread state which is passed to signal handlers when a process has set a user LDT. The full thread state includes GSBASE, so GSBASE is set back to whatever is in the sigcontext on return (like every other field in the context). This would avoid needing to explicitly reset GSBASE in `leave_handler()`.
This strategy was simpler, but I'm not using it for 2 reasons: - the "full" thread state is only available starting with macOS 10.15, and we still support 10.13. - more crucially, Rosetta 2 doesn't seem to correctly implement the GS.base field of the full thread state. It's set to 0 on entry, and isn't read on exit.
-- v6: ntdll: Remove x86_64 Mac-specific TEB access workarounds that are no longer needed. ntdll: On macOS x86_64, swap GSBASE between the TEB and macOS TSD when entering/leaving PE code. ntdll: Set %rsp to be inside syscall_frame before accessing %gs in x86_64 syscall dispatcher.
From: Brendan Shanks bshanks@codeweavers.com
84760a8fb2cf9ed577c63957c5bdfc621d748a7f started using the documented Mach API method. --- dlls/ntdll/unix/signal_x86_64.c | 38 --------------------------------- 1 file changed, 38 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 26b540bd629..8bddc073f74 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2440,50 +2440,12 @@ static void *mac_thread_gsbase(void) { struct thread_identifier_info tiinfo; unsigned int info_count = THREAD_IDENTIFIER_INFO_COUNT; - static int gsbase_offset = -1;
mach_port_t self = mach_thread_self(); kern_return_t kr = thread_info(self, THREAD_IDENTIFIER_INFO, (thread_info_t) &tiinfo, &info_count); mach_port_deallocate(mach_task_self(), self);
if (kr == KERN_SUCCESS) return (void*)tiinfo.thread_handle; - - if (gsbase_offset < 0) - { - /* Search for the array of TLS slots within the pthread data structure. - That's what the macOS pthread implementation uses for gsbase. */ - const void* const sentinel1 = (const void*)0x2bffb6b4f11228ae; - const void* const sentinel2 = (const void*)0x0845a7ff6ab76707; - int rc; - pthread_key_t key; - const void** p = (const void**)pthread_self(); - int i; - - gsbase_offset = 0; - if ((rc = pthread_key_create(&key, NULL))) return NULL; - - pthread_setspecific(key, sentinel1); - - for (i = key + 1; i < 2000; i++) /* arbitrary limit */ - { - if (p[i] == sentinel1) - { - pthread_setspecific(key, sentinel2); - - if (p[i] == sentinel2) - { - gsbase_offset = (i - key) * sizeof(*p); - break; - } - - pthread_setspecific(key, sentinel1); - } - } - - pthread_key_delete(key); - } - - if (gsbase_offset) return (char*)pthread_self() + gsbase_offset; return NULL; } #endif
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/unix/signal_x86_64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 8bddc073f74..046e96be99b 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -1974,9 +1974,9 @@ static BOOL handle_syscall_trap( ucontext_t *sigcontext, siginfo_t *siginfo ) */ static void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { + ucontext_t *ucontext = init_handler( sigcontext ); EXCEPTION_RECORD rec = { 0 }; struct xcontext context; - ucontext_t *ucontext = init_handler( sigcontext );
rec.ExceptionAddress = (void *)RIP_sig(ucontext); save_context( &context, ucontext ); @@ -2059,9 +2059,9 @@ static void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) */ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { + ucontext_t *ucontext = init_handler( sigcontext ); EXCEPTION_RECORD rec = { 0 }; struct xcontext context; - ucontext_t *ucontext = init_handler( sigcontext );
if (handle_syscall_trap( ucontext, siginfo )) return;
@@ -2093,8 +2093,8 @@ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext ) */ static void fpe_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { - EXCEPTION_RECORD rec = { 0 }; ucontext_t *ucontext = init_handler( sigcontext ); + EXCEPTION_RECORD rec = { 0 };
switch (siginfo->si_code) {
From: Brendan Shanks bshanks@codeweavers.com
In preparation for switching GSBASE on macOS. --- dlls/ntdll/unix/signal_x86_64.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 046e96be99b..aae578dac05 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -423,7 +423,7 @@ struct syscall_frame void *syscall_cfa; /* 00a8 */ DWORD syscall_flags; /* 00b0 */ DWORD restore_flags; /* 00b4 */ - DWORD align[2]; /* 00b8 */ + ULONG64 teb; /* 00b8 */ XMM_SAVE_AREA32 xsave; /* 00c0 */ DECLSPEC_ALIGN(64) XSAVE_AREA_HEADER xstate; /* 02c0 */ }; @@ -2634,6 +2634,7 @@ void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB frame->restore_flags |= CONTEXT_INTEGER; frame->syscall_flags = syscall_flags; frame->syscall_cfa = syscall_cfa; + frame->teb = (ULONG64)teb; if ((callback = instrumentation_callback)) { frame->r10 = frame->rip; @@ -2721,6 +2722,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movw %ss,0x90(%rcx)\n\t" "movq %rbp,0x98(%rcx)\n\t" __ASM_CFI_REG_IS_AT2(rbp, rcx, 0x98, 0x01) + "movq %gs:0x30,%r14\n\t" + "movq %r14,0xb8(%rcx)\n\t" /* frame->teb */ /* Legends of Runeterra hooks the first system call return instruction, and * depends on us returning to it. Adjust the return address accordingly. */ "subq $0xb,0x70(%rcx)\n\t" @@ -2780,10 +2783,14 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, __ASM_CFI(".cfi_offset %r15,-0x38\n\t") __ASM_CFI(".cfi_undefined %rdi\n\t") __ASM_CFI(".cfi_undefined %rsi\n\t") + /* When on the kernel stack, use frame->teb instead of %gs to access the TEB. + * (on macOS, signal handlers set gsbase to pthread_teb when on the kernel stack). + */ #ifdef __linux__ "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ "jz 2f\n\t" - "movq %gs:0x320,%rsi\n\t" /* amd64_thread_data()->pthread_teb */ + "movq 0xb8(%rcx),%rsi\n\t" /* frame->teb */ + "movq 0x320(%rsi),%rsi\n\t" /* amd64_thread_data()->pthread_teb */ "testl $8,%r14d\n\t" /* SYSCALL_HAVE_WRFSGSBASE */ "jz 1f\n\t" "wrfsbase %rsi\n\t" @@ -2799,12 +2806,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movl %eax,%ebx\n\t" "shrl $8,%ebx\n\t" "andl $0x30,%ebx\n\t" /* syscall table number */ -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x330(%rcx),%rcx\n\t" -#else - "movq %gs:0x330,%rcx\n\t" /* amd64_thread_data()->syscall_table */ -#endif + "movq 0xb8(%rcx),%rcx\n\t" /* frame->teb */ + "movq 0x330(%rcx),%rcx\n\t" /* amd64_thread_data()->syscall_table */ "leaq (%rcx,%rbx,2),%rbx\n\t" "andl $0xfff,%eax\n\t" /* syscall number */ "cmpq 16(%rbx),%rax\n\t" /* table->ServiceLimit */
From: Brendan Shanks bshanks@codeweavers.com
If a signal occurs, is_inside_syscall() needs to return FALSE so GSBASE is reset to the TEB. In preparation for switching GSBASE on macOS. --- dlls/ntdll/unix/signal_x86_64.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index aae578dac05..935a385be33 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2833,6 +2833,16 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "callq *(%r10,%rax,8)\n\t" "leaq -0x98(%rbp),%rcx\n\t" __ASM_LOCAL_LABEL("__wine_syscall_dispatcher_return") ":\n\t" + /* push rbp-based kernel stack cfi */ + __ASM_CFI(".cfi_remember_state\n\t") + __ASM_CFI_CFA_IS_AT2(rcx, 0xa8, 0x01) /* frame->syscall_cfa */ + "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ +#ifdef __linux__ + "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ + "jz 1f\n\t" + "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ + "1:\n\t" +#endif "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ "testl $0x48,%edx\n\t" /* CONTEXT_FLOATING_POINT | CONTEXT_XSTATE */ "jnz 2f\n\t" @@ -2864,9 +2874,6 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "jmp 4f\n" "3:\tfxrstor64 0xc0(%rcx)\n" "4:\tmovq 0x98(%rcx),%rbp\n\t" - /* push rbp-based kernel stack cfi */ - __ASM_CFI(".cfi_remember_state\n\t") - __ASM_CFI_CFA_IS_AT2(rcx, 0xa8, 0x01) /* frame->syscall_cfa */ "movq 0x68(%rcx),%r15\n\t" "movq 0x58(%rcx),%r13\n\t" "movq 0x50(%rcx),%r12\n\t" @@ -2874,13 +2881,6 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x28(%rcx),%rdi\n\t" "movq 0x20(%rcx),%rsi\n\t" "movq 0x08(%rcx),%rbx\n\t" - "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ -#ifdef __linux__ - "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ - "jz 1f\n\t" - "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ - "1:\n\t" -#endif "testl $0x10000,%edx\n\t" /* RESTORE_FLAGS_INSTRUMENTATION */ "movq 0x60(%rcx),%r14\n\t" "jnz 2f\n\t"
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/loader.c | 11 ----- dlls/ntdll/unix/signal_x86_64.c | 79 ++++++++++++++++++++++++++++++--- 2 files changed, 72 insertions(+), 18 deletions(-)
diff --git a/dlls/ntdll/loader.c b/dlls/ntdll/loader.c index 74eb1b7f500..e75225be1f7 100644 --- a/dlls/ntdll/loader.c +++ b/dlls/ntdll/loader.c @@ -1374,9 +1374,6 @@ static BOOL alloc_tls_slot( LDR_DATA_TABLE_ENTRY *mod ) if (!new) return FALSE; if (old) memcpy( new, old, old_module_count * sizeof(*new) ); teb->ThreadLocalStoragePointer = new; -#ifdef __x86_64__ /* macOS-specific hack */ - if (teb->Instrumentation[0]) ((TEB *)teb->Instrumentation[0])->ThreadLocalStoragePointer = new; -#endif TRACE( "thread %04lx tls block %p -> %p\n", HandleToULong(teb->ClientId.UniqueThread), old, new ); /* FIXME: can't free old block here, should be freed at thread exit */ } @@ -1622,10 +1619,6 @@ static NTSTATUS alloc_thread_tls(void) TRACE( "slot %u: %u/%lu bytes at %p\n", i, size, dir->SizeOfZeroFill, pointers[i] ); } NtCurrentTeb()->ThreadLocalStoragePointer = pointers; -#ifdef __x86_64__ /* macOS-specific hack */ - if (NtCurrentTeb()->Instrumentation[0]) - ((TEB *)NtCurrentTeb()->Instrumentation[0])->ThreadLocalStoragePointer = pointers; -#endif return STATUS_SUCCESS; }
@@ -3925,10 +3918,6 @@ void WINAPI LdrShutdownThread(void) if ((pointers = NtCurrentTeb()->ThreadLocalStoragePointer)) { NtCurrentTeb()->ThreadLocalStoragePointer = NULL; -#ifdef __x86_64__ /* macOS-specific hack */ - if (NtCurrentTeb()->Instrumentation[0]) - ((TEB *)NtCurrentTeb()->Instrumentation[0])->ThreadLocalStoragePointer = NULL; -#endif for (i = 0; i < tls_module_count; i++) RtlFreeHeap( GetProcessHeap(), 0, pointers[i] ); RtlFreeHeap( GetProcessHeap(), 0, pointers ); } diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 935a385be33..2a7b2e4d353 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -63,6 +63,11 @@ #endif #ifdef __APPLE__ # include <mach/mach.h> +/* _thread_set_tsd_base is private API for setting GSBASE, added in macOS 10.12. + * It's a small thunk that sets %esi, %eax, and does the syscall (which clobbers + * %rcx and %r11). + */ +extern void _thread_set_tsd_base(uint64_t); #endif
#include "ntstatus.h" @@ -462,7 +467,7 @@ static inline struct amd64_thread_data *amd64_thread_data(void) return (struct amd64_thread_data *)ntdll_get_thread_data()->cpu_data; }
-#ifdef __linux__ +#if defined(__linux__) || defined(__APPLE__) static inline TEB *get_current_teb(void) { unsigned long rsp; @@ -846,6 +851,10 @@ static inline ucontext_t *init_handler( void *sigcontext ) struct ntdll_thread_data *thread_data = (struct ntdll_thread_data *)&get_current_teb()->GdiTebBatch; arch_prctl( ARCH_SET_FS, ((struct amd64_thread_data *)thread_data->cpu_data)->pthread_teb ); } +#endif +#ifdef __APPLE__ + struct ntdll_thread_data *thread_data = (struct ntdll_thread_data *)&get_current_teb()->GdiTebBatch; + _thread_set_tsd_base( (uint64_t)((struct amd64_thread_data *)thread_data->cpu_data)->pthread_teb ); #endif return sigcontext; } @@ -860,6 +869,10 @@ static inline void leave_handler( ucontext_t *sigcontext ) if (fs32_sel && !is_inside_signal_stack( (void *)RSP_sig(sigcontext )) && !is_inside_syscall(sigcontext)) __asm__ volatile( "movw %0,%%fs" :: "r" (fs32_sel) ); #endif +#ifdef __APPLE__ + if (!is_inside_signal_stack( (void *)RSP_sig(sigcontext )) && !is_inside_syscall(sigcontext)) + _thread_set_tsd_base( (uint64_t)NtCurrentTeb() ); +#endif #ifdef DS_sig DS_sig(sigcontext) = ds64_sel; #else @@ -1644,6 +1657,12 @@ __ASM_GLOBAL_FUNC( call_user_mode_callback, "test %r10,%r10\n\t" "jz 1f\n\t" "xchgq %rcx,%r10\n\t" +#ifdef __APPLE__ + "1\t:pushq %rcx\n\t" + "movq %r8,%rdi\n\t" + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rcx\n\t" +#endif "1\t:jmpq *%rcx" ) /* func */
@@ -1653,6 +1672,16 @@ __ASM_GLOBAL_FUNC( call_user_mode_callback, extern void DECLSPEC_NORETURN user_mode_callback_return( void *ret_ptr, ULONG ret_len, NTSTATUS status, TEB *teb ); __ASM_GLOBAL_FUNC( user_mode_callback_return, +#ifdef __APPLE__ + "pushq %rcx\n\t" + "pushq %rdi\n\t" + "pushq %rsi\n\t" + "movq 0x320(%rcx),%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rsi\n\t" + "popq %rdi\n\t" + "popq %rcx\n\t" +#endif "movq 0x328(%rcx),%r10\n\t" /* amd64_thread_data()->syscall_frame */ "movq 0xa0(%r10),%r11\n\t" /* frame->prev_frame */ "movq %r11,0x328(%rcx)\n\t" /* amd64_thread_data()->syscall_frame = prev_frame */ @@ -2571,13 +2600,7 @@ void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB #elif defined(__NetBSD__) sysarch( X86_64_SET_GSBASE, &teb ); #elif defined (__APPLE__) - __asm__ volatile ("movq %0,%%gs:%c1" :: "r" (teb->Tib.Self), "n" (FIELD_OFFSET(TEB, Tib.Self))); - __asm__ volatile ("movq %0,%%gs:%c1" :: "r" (teb->ThreadLocalStoragePointer), "n" (FIELD_OFFSET(TEB, ThreadLocalStoragePointer))); thread_data->pthread_teb = mac_thread_gsbase(); - /* alloc_tls_slot() needs to poke a value to an address relative to each - thread's gsbase. Have each thread record its gsbase pointer into its - TEB so alloc_tls_slot() can find it. */ - teb->Instrumentation[0] = thread_data->pthread_teb; #else # error Please define setting %gs for your architecture #endif @@ -2800,6 +2823,12 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "syscall\n\t" "leaq -0x98(%rbp),%rcx\n" "2:\n\t" +#endif +#ifdef __APPLE__ + "movq 0xb8(%rcx),%rdi\n\t" /* frame->teb */ + "movq 0x320(%rdi),%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "leaq -0x98(%rbp),%rcx\n" #endif "movq 0x00(%rcx),%rax\n\t" "movq 0x18(%rcx),%r11\n\t" /* 2nd argument */ @@ -2842,6 +2871,16 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "jz 1f\n\t" "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ "1:\n\t" +#endif +#ifdef __APPLE__ + "movq 0x88(%rcx),%rsp\n\t" /* use the user stack for this call */ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "movq 0xb8(%rcx),%rdi\n\t" /* frame->teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" + "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ #endif "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ "testl $0x48,%edx\n\t" /* CONTEXT_FLOATING_POINT | CONTEXT_XSTATE */ @@ -3013,6 +3052,10 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, __ASM_CFI_CFA_IS_AT2(rcx, 0x88, 0x01) "movq %rbp,0x98(%rcx)\n\t" __ASM_CFI_REG_IS_AT2(rbp, rcx, 0x98, 0x01) +#ifdef __APPLE__ + "movq %gs:0x30,%r14\n\t" + "movq %r14,0xb8(%rcx)\n\t" /* frame->teb */ +#endif "movdqa %xmm6,0x1c0(%rcx)\n\t" "movdqa %xmm7,0x1d0(%rcx)\n\t" "movdqa %xmm8,0x1e0(%rcx)\n\t" @@ -3050,6 +3093,18 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "mov $158,%eax\n\t" /* SYS_arch_prctl */ "syscall\n\t" "2:\n\t" +#endif +#ifdef __APPLE__ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "pushq %rdx\n\t" + "pushq %rsi\n\t" + "movq %gs:0x320,%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rsi\n\t" + "popq %rdx\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" #endif "movq %r8,%rdi\n\t" /* args */ "callq *(%r10,%rdx,8)\n\t" @@ -3074,6 +3129,16 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "jz 1f\n\t" "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ "1:\n\t" +#endif +#ifdef __APPLE__ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "pushq %rdx\n\t" + "movq 0xb8(%rcx),%rdi\n\t" /* frame->teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rdx\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" #endif "movq 0x60(%rcx),%r14\n\t" "movq 0x28(%rcx),%rdi\n\t"
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/unix/signal_x86_64.c | 31 ------------------------------- 1 file changed, 31 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 2a7b2e4d353..51a8b317b7a 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2708,12 +2708,7 @@ __ASM_GLOBAL_FUNC( signal_start_thread, * __wine_syscall_dispatcher */ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00) @@ -2753,12 +2748,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movl 0xb0(%rcx),%r14d\n\t" /* frame->syscall_flags */ "testl $3,%r14d\n\t" /* SYSCALL_HAVE_XSAVE | SYSCALL_HAVE_XSAVEC */ "jz 2f\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rdx\n\t" - "movl 0x340(%rdx),%eax\n\t" -#else "movl %gs:0x340,%eax\n\t" /* amd64_thread_data()->xstate_features_mask */ -#endif "xorl %edx,%edx\n\t" "andl $7,%eax\n\t" "xorq %rbp,%rbp\n\t" @@ -2899,14 +2889,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "2:\ttestl $3,%r14d\n\t" /* SYSCALL_HAVE_XSAVE | SYSCALL_HAVE_XSAVEC */ "jz 3f\n\t" "movq %rax,%r11\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rdx\n\t" - "movl 0x340(%rdx),%eax\n\t" - "movl 0x344(%rdx),%edx\n\t" -#else "movl %gs:0x340,%eax\n\t" /* amd64_thread_data()->xstate_features_mask */ "movl %gs:0x344,%edx\n\t" /* amd64_thread_data()->xstate_features_mask high dword */ -#endif "xrstor64 0xc0(%rcx)\n\t" "movq %r11,%rax\n\t" "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ @@ -2970,12 +2954,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x10(%rcx),%rcx\n\t" "iretq\n" /* RESTORE_FLAGS_INSTRUMENTATION */ -#ifdef __APPLE__ - "2:\tmovq %gs:0x30,%r10\n\t" - "movq 0x348(%r10),%r10\n\t" -#else "2:\tmovq %gs:0x348,%r10\n\t" /* amd64_thread_data()->instrumentation_callback */ -#endif "movq (%r10),%r10\n\t" "test %r10,%r10\n\t" "jz 3b\n\t" @@ -3001,12 +2980,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_return,
__ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_instrumentation, -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00) @@ -3023,12 +2997,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_instrumentation, */ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "movq %rcx,%r10\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00)
On Wed Mar 19 05:43:39 2025 +0000, Paul Gofman wrote:
Looks like patch subject is a bit misleading? "leaq 0x70(%rcx),%rsp\n\t" doesn't leave kernel stack, maybe "ntdll: Move stack to machine frame before accessing %gs in x86_64 syscall dispatcher."?
Good catch, I think I put that as a placeholder but forgot to update it later.
Alexandre Julliard (@julliard) commented about dlls/ntdll/unix/signal_x86_64.c:
"leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */
+#ifdef __linux__
"testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */
"jz 1f\n\t"
"movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */
"1:\n\t"
+#endif +#ifdef __APPLE__
"movq 0x88(%rcx),%rsp\n\t" /* use the user stack for this call */
"pushq %rax\n\t"
"pushq %rcx\n\t"
"movq 0xb8(%rcx),%rdi\n\t" /* frame->teb */
"call " __ASM_NAME("_thread_set_tsd_base") "\n\t"
"popq %rcx\n\t"
"popq %rax\n\t"
"leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */
I don't think we should be touching the user stack at that point.
On Wed Mar 19 17:19:13 2025 +0000, Alexandre Julliard wrote:
I don't think we should be touching the user stack at that point.
Any recommendation on what I should use as a stack there? Maybe move the `syscall_frame` down in memory slightly (say, 128 bytes) and use the free area as a stack? Or exempt some space below the `syscall_frame` from `is_inside_syscall()` and use that?
On Wed Mar 19 20:18:00 2025 +0000, Brendan Shanks wrote:
Any recommendation on what I should use as a stack there? Maybe move the `syscall_frame` down in memory slightly (say, 128 bytes) and use the free area as a stack? Or exempt some space below the `syscall_frame` from `is_inside_syscall()` and use that?
Any chance we could use a syscall instead of having to call a library function?
On Wed Mar 19 20:47:22 2025 +0000, Alexandre Julliard wrote:
Any chance we could use a syscall instead of having to call a library function?
We could, but there's no guarantee of stability for the syscall number. The syscall number hasn't changed since introduction (in 2016's 10.12) though, so it's probably safe to depend on.
On Wed Mar 19 22:07:13 2025 +0000, Brendan Shanks wrote:
We could, but there's no guarantee of stability for the syscall number. The syscall number hasn't changed since introduction (in 2016's 10.12) though, so it's probably safe to depend on.
We can probably check that the `_thread_set_tsd_base` machine code matches the expected byte sequence. We can either dynamically extract the syscall number, or hard-code it and check against the system value.
On Mon Mar 24 15:18:34 2025 +0000, Jinoh Kang wrote:
We can probably check that the `_thread_set_tsd_base` machine code matches the expected byte sequence. We can either dynamically extract the syscall number, or hard-code it and check against the system value.
Regarding XNU syscalls, they have been historically sort of stable AFAICT and even when a new syscall gets added that supersedes the previous one (like `mach_msg_trap` vs `mach_msg2_trap`) either the previous one stops existing or returns an error value, but they are not recycled and sometimes even kept forward-compatible for a while.
Probably hard-coding and checking at compile time against `sys/syscall.h` will be sufficient here I would guess.
checking at compile time against `sys/syscall.h` will be sufficient here I would guess.
It's not in `sys/syscall.h`. thread_set_tsd_base() is a machdep syscall.
On Tue Mar 25 11:21:48 2025 +0000, Jinoh Kang wrote:
checking at compile time against `sys/syscall.h` will be sufficient
here I would guess. It's not in `sys/syscall.h`. thread_set_tsd_base() is a machdep syscall.
Ah interesting that it is machdep syscall and not a unix syscall, but that does make sense. In that case checking against the machine code is probably the best option like you suggested.