x86_64 Windows and macOS both use `%gs` to access thread-specific data (Windows TEB, macOS TSD). To date, Wine has worked around this conflict by filling the most important TEB fields (`0x30`/`Self`, `0x58`/`ThreadLocalStorage`) in the macOS TSD structure (Apple reserved the fields for our use). This was sufficient for most Windows apps.
CrossOver's Wine had an additional hack to handle `0x60`/`ProcessEnvironmentBlock`, and binary patches for certain CEF binaries which directly accessed `0x8`/`StackBase`. Additionally, Apple's libd3dshared could activate a special mode in Rosetta 2 where code executing in certain regions would use the Windows TEB when accessing `%gs`.
Now that the PE separation is complete, GSBASE can be swapped when entering/exiting PE code. This is done in the syscall dispatcher, unix-call dispatcher, and for user-mode callbacks. GSBASE also needs to be set to the macOS TSD when entering signal handlers (in `init_handler()`), and then restored to the Windows TEB when exiting (in `leave_handler()`). Some changes to the syscall dispatcher were needed to ensure that the TEB is not accessed through `%gs` while on the kernel stack (since a SIGUSR1 while on the kernel stack will result in GSBASE being set to the TSD).
---
I've tested this successfully on macOS 15 (Apple Silicon and Intel) and macOS 10.13 with several apps and games, including the `cefclient.exe` CEF sample.
Encouragingly, in some simple tests I didn't see a noticeable performance regression from this MR.
There are drawbacks though: - libraries which jump directly from PE code into Unix code (expecting that %gs is always pointing to the macOS TSD) will crash. Notable examples are D3DMetal and DXMT. These will need to be changed to use Unix calls. - If Windows code uses the `syscall` instruction directly, the stack pointer likely needs to be valid (which is probably not true on Windows). This is due to the syscall dispatcher saving registers onto the user stack and having to call `_thread_set_tsd_base`. I can't say I've ever seen direct syscalls done with an invalid `%rsp`, but it seems like something anticheat code might do.
---
macOS does not have a public API for setting GSBASE, but the private `_thread_set_tsd_base()` works and was added in macOS 10.12.
`_thread_set_tsd_base()` is a small thunk that sets `%esi`, `%eax`, and does the `syscall`: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b.... The syscall instruction itself clobbers `%rcx` and `%r11`.
I've tried to save as few registers as possible when calling `_thread_set_tsd_base()`, but there may be room for improvement there.
---
I also tested an alternate implementation strategy for this which took advantage of the expanded "full" thread state which is passed to signal handlers when a process has set a user LDT. The full thread state includes GSBASE, so GSBASE is set back to whatever is in the sigcontext on return (like every other field in the context). This would avoid needing to explicitly reset GSBASE in `leave_handler()`.
This strategy was simpler, but I'm not using it for 2 reasons: - the "full" thread state is only available starting with macOS 10.15, and we still support 10.13. - more crucially, Rosetta 2 doesn't seem to correctly implement the GS.base field of the full thread state. It's set to 0 on entry, and isn't read on exit.
-- v7: ntdll: Remove x86_64 Mac-specific TEB access workarounds that are no longer needed. ntdll: On macOS x86_64, swap GSBASE between the TEB and macOS TSD when entering/leaving PE code. ntdll: Set %rsp to be inside syscall_frame before accessing %gs in x86_64 syscall dispatcher. ntdll: Don't access the TEB through %gs when using the kernel stack in x86_64 syscall dispatcher. ntdll: Ensure init_handler runs in signal handlers before any compiler-generated memset calls. ntdll: Remove ugly fallback method for getting a thread's GSBASE on macOS.
From: Brendan Shanks bshanks@codeweavers.com
84760a8fb2cf9ed577c63957c5bdfc621d748a7f started using the documented Mach API method. --- dlls/ntdll/unix/signal_x86_64.c | 38 --------------------------------- 1 file changed, 38 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 26b540bd629..8bddc073f74 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2440,50 +2440,12 @@ static void *mac_thread_gsbase(void) { struct thread_identifier_info tiinfo; unsigned int info_count = THREAD_IDENTIFIER_INFO_COUNT; - static int gsbase_offset = -1;
mach_port_t self = mach_thread_self(); kern_return_t kr = thread_info(self, THREAD_IDENTIFIER_INFO, (thread_info_t) &tiinfo, &info_count); mach_port_deallocate(mach_task_self(), self);
if (kr == KERN_SUCCESS) return (void*)tiinfo.thread_handle; - - if (gsbase_offset < 0) - { - /* Search for the array of TLS slots within the pthread data structure. - That's what the macOS pthread implementation uses for gsbase. */ - const void* const sentinel1 = (const void*)0x2bffb6b4f11228ae; - const void* const sentinel2 = (const void*)0x0845a7ff6ab76707; - int rc; - pthread_key_t key; - const void** p = (const void**)pthread_self(); - int i; - - gsbase_offset = 0; - if ((rc = pthread_key_create(&key, NULL))) return NULL; - - pthread_setspecific(key, sentinel1); - - for (i = key + 1; i < 2000; i++) /* arbitrary limit */ - { - if (p[i] == sentinel1) - { - pthread_setspecific(key, sentinel2); - - if (p[i] == sentinel2) - { - gsbase_offset = (i - key) * sizeof(*p); - break; - } - - pthread_setspecific(key, sentinel1); - } - } - - pthread_key_delete(key); - } - - if (gsbase_offset) return (char*)pthread_self() + gsbase_offset; return NULL; } #endif
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/unix/signal_x86_64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 8bddc073f74..046e96be99b 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -1974,9 +1974,9 @@ static BOOL handle_syscall_trap( ucontext_t *sigcontext, siginfo_t *siginfo ) */ static void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { + ucontext_t *ucontext = init_handler( sigcontext ); EXCEPTION_RECORD rec = { 0 }; struct xcontext context; - ucontext_t *ucontext = init_handler( sigcontext );
rec.ExceptionAddress = (void *)RIP_sig(ucontext); save_context( &context, ucontext ); @@ -2059,9 +2059,9 @@ static void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) */ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { + ucontext_t *ucontext = init_handler( sigcontext ); EXCEPTION_RECORD rec = { 0 }; struct xcontext context; - ucontext_t *ucontext = init_handler( sigcontext );
if (handle_syscall_trap( ucontext, siginfo )) return;
@@ -2093,8 +2093,8 @@ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext ) */ static void fpe_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { - EXCEPTION_RECORD rec = { 0 }; ucontext_t *ucontext = init_handler( sigcontext ); + EXCEPTION_RECORD rec = { 0 };
switch (siginfo->si_code) {
From: Brendan Shanks bshanks@codeweavers.com
In preparation for switching GSBASE on macOS. --- dlls/ntdll/unix/signal_x86_64.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 046e96be99b..aae578dac05 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -423,7 +423,7 @@ struct syscall_frame void *syscall_cfa; /* 00a8 */ DWORD syscall_flags; /* 00b0 */ DWORD restore_flags; /* 00b4 */ - DWORD align[2]; /* 00b8 */ + ULONG64 teb; /* 00b8 */ XMM_SAVE_AREA32 xsave; /* 00c0 */ DECLSPEC_ALIGN(64) XSAVE_AREA_HEADER xstate; /* 02c0 */ }; @@ -2634,6 +2634,7 @@ void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB frame->restore_flags |= CONTEXT_INTEGER; frame->syscall_flags = syscall_flags; frame->syscall_cfa = syscall_cfa; + frame->teb = (ULONG64)teb; if ((callback = instrumentation_callback)) { frame->r10 = frame->rip; @@ -2721,6 +2722,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movw %ss,0x90(%rcx)\n\t" "movq %rbp,0x98(%rcx)\n\t" __ASM_CFI_REG_IS_AT2(rbp, rcx, 0x98, 0x01) + "movq %gs:0x30,%r14\n\t" + "movq %r14,0xb8(%rcx)\n\t" /* frame->teb */ /* Legends of Runeterra hooks the first system call return instruction, and * depends on us returning to it. Adjust the return address accordingly. */ "subq $0xb,0x70(%rcx)\n\t" @@ -2780,10 +2783,14 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, __ASM_CFI(".cfi_offset %r15,-0x38\n\t") __ASM_CFI(".cfi_undefined %rdi\n\t") __ASM_CFI(".cfi_undefined %rsi\n\t") + /* When on the kernel stack, use frame->teb instead of %gs to access the TEB. + * (on macOS, signal handlers set gsbase to pthread_teb when on the kernel stack). + */ #ifdef __linux__ "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ "jz 2f\n\t" - "movq %gs:0x320,%rsi\n\t" /* amd64_thread_data()->pthread_teb */ + "movq 0xb8(%rcx),%rsi\n\t" /* frame->teb */ + "movq 0x320(%rsi),%rsi\n\t" /* amd64_thread_data()->pthread_teb */ "testl $8,%r14d\n\t" /* SYSCALL_HAVE_WRFSGSBASE */ "jz 1f\n\t" "wrfsbase %rsi\n\t" @@ -2799,12 +2806,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movl %eax,%ebx\n\t" "shrl $8,%ebx\n\t" "andl $0x30,%ebx\n\t" /* syscall table number */ -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x330(%rcx),%rcx\n\t" -#else - "movq %gs:0x330,%rcx\n\t" /* amd64_thread_data()->syscall_table */ -#endif + "movq 0xb8(%rcx),%rcx\n\t" /* frame->teb */ + "movq 0x330(%rcx),%rcx\n\t" /* amd64_thread_data()->syscall_table */ "leaq (%rcx,%rbx,2),%rbx\n\t" "andl $0xfff,%eax\n\t" /* syscall number */ "cmpq 16(%rbx),%rax\n\t" /* table->ServiceLimit */
From: Brendan Shanks bshanks@codeweavers.com
If a signal occurs, is_inside_syscall() needs to return FALSE so GSBASE is reset to the TEB. In preparation for switching GSBASE on macOS. --- dlls/ntdll/unix/signal_x86_64.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index aae578dac05..935a385be33 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2833,6 +2833,16 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "callq *(%r10,%rax,8)\n\t" "leaq -0x98(%rbp),%rcx\n\t" __ASM_LOCAL_LABEL("__wine_syscall_dispatcher_return") ":\n\t" + /* push rbp-based kernel stack cfi */ + __ASM_CFI(".cfi_remember_state\n\t") + __ASM_CFI_CFA_IS_AT2(rcx, 0xa8, 0x01) /* frame->syscall_cfa */ + "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ +#ifdef __linux__ + "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ + "jz 1f\n\t" + "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ + "1:\n\t" +#endif "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ "testl $0x48,%edx\n\t" /* CONTEXT_FLOATING_POINT | CONTEXT_XSTATE */ "jnz 2f\n\t" @@ -2864,9 +2874,6 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "jmp 4f\n" "3:\tfxrstor64 0xc0(%rcx)\n" "4:\tmovq 0x98(%rcx),%rbp\n\t" - /* push rbp-based kernel stack cfi */ - __ASM_CFI(".cfi_remember_state\n\t") - __ASM_CFI_CFA_IS_AT2(rcx, 0xa8, 0x01) /* frame->syscall_cfa */ "movq 0x68(%rcx),%r15\n\t" "movq 0x58(%rcx),%r13\n\t" "movq 0x50(%rcx),%r12\n\t" @@ -2874,13 +2881,6 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x28(%rcx),%rdi\n\t" "movq 0x20(%rcx),%rsi\n\t" "movq 0x08(%rcx),%rbx\n\t" - "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ -#ifdef __linux__ - "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ - "jz 1f\n\t" - "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ - "1:\n\t" -#endif "testl $0x10000,%edx\n\t" /* RESTORE_FLAGS_INSTRUMENTATION */ "movq 0x60(%rcx),%r14\n\t" "jnz 2f\n\t"
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/loader.c | 11 ----- dlls/ntdll/unix/signal_x86_64.c | 82 ++++++++++++++++++++++++++++++--- 2 files changed, 75 insertions(+), 18 deletions(-)
diff --git a/dlls/ntdll/loader.c b/dlls/ntdll/loader.c index 1cce2ab3466..67c4aa546bc 100644 --- a/dlls/ntdll/loader.c +++ b/dlls/ntdll/loader.c @@ -1385,9 +1385,6 @@ static BOOL alloc_tls_slot( LDR_DATA_TABLE_ENTRY *mod ) if (!new) return FALSE; if (old) memcpy( new, old, old_module_count * sizeof(*new) ); teb->ThreadLocalStoragePointer = new; -#ifdef __x86_64__ /* macOS-specific hack */ - if (teb->Instrumentation[0]) ((TEB *)teb->Instrumentation[0])->ThreadLocalStoragePointer = new; -#endif TRACE( "thread %04lx tls block %p -> %p\n", HandleToULong(teb->ClientId.UniqueThread), old, new ); /* FIXME: can't free old block here, should be freed at thread exit */ } @@ -1633,10 +1630,6 @@ static NTSTATUS alloc_thread_tls(void) TRACE( "slot %u: %u/%lu bytes at %p\n", i, size, dir->SizeOfZeroFill, pointers[i] ); } NtCurrentTeb()->ThreadLocalStoragePointer = pointers; -#ifdef __x86_64__ /* macOS-specific hack */ - if (NtCurrentTeb()->Instrumentation[0]) - ((TEB *)NtCurrentTeb()->Instrumentation[0])->ThreadLocalStoragePointer = pointers; -#endif return STATUS_SUCCESS; }
@@ -3941,10 +3934,6 @@ void WINAPI LdrShutdownThread(void) if ((pointers = NtCurrentTeb()->ThreadLocalStoragePointer)) { NtCurrentTeb()->ThreadLocalStoragePointer = NULL; -#ifdef __x86_64__ /* macOS-specific hack */ - if (NtCurrentTeb()->Instrumentation[0]) - ((TEB *)NtCurrentTeb()->Instrumentation[0])->ThreadLocalStoragePointer = NULL; -#endif for (i = 0; i < tls_module_count; i++) RtlFreeHeap( GetProcessHeap(), 0, pointers[i] ); RtlFreeHeap( GetProcessHeap(), 0, pointers ); } diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 935a385be33..70b33ceb75c 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -63,6 +63,14 @@ #endif #ifdef __APPLE__ # include <mach/mach.h> +/* _thread_set_tsd_base is private API for setting GSBASE, added in macOS 10.12. + * It's a small thunk that sets %eax, zeroes %esi, and does the syscall (which clobbers + * %rcx and %r11). + * See https://github.com/apple-oss-distributions/xnu/blob/main/libsyscall/custom/c... + * or libsystem_kernel.dylib. + * Note that the dispatchers do the syscall directly to avoid using the stack. + */ +extern void _thread_set_tsd_base(uint64_t); #endif
#include "ntstatus.h" @@ -462,7 +470,7 @@ static inline struct amd64_thread_data *amd64_thread_data(void) return (struct amd64_thread_data *)ntdll_get_thread_data()->cpu_data; }
-#ifdef __linux__ +#if defined(__linux__) || defined(__APPLE__) static inline TEB *get_current_teb(void) { unsigned long rsp; @@ -846,6 +854,10 @@ static inline ucontext_t *init_handler( void *sigcontext ) struct ntdll_thread_data *thread_data = (struct ntdll_thread_data *)&get_current_teb()->GdiTebBatch; arch_prctl( ARCH_SET_FS, ((struct amd64_thread_data *)thread_data->cpu_data)->pthread_teb ); } +#endif +#ifdef __APPLE__ + struct ntdll_thread_data *thread_data = (struct ntdll_thread_data *)&get_current_teb()->GdiTebBatch; + _thread_set_tsd_base( (uint64_t)((struct amd64_thread_data *)thread_data->cpu_data)->pthread_teb ); #endif return sigcontext; } @@ -860,6 +872,10 @@ static inline void leave_handler( ucontext_t *sigcontext ) if (fs32_sel && !is_inside_signal_stack( (void *)RSP_sig(sigcontext )) && !is_inside_syscall(sigcontext)) __asm__ volatile( "movw %0,%%fs" :: "r" (fs32_sel) ); #endif +#ifdef __APPLE__ + if (!is_inside_signal_stack( (void *)RSP_sig(sigcontext )) && !is_inside_syscall(sigcontext)) + _thread_set_tsd_base( (uint64_t)NtCurrentTeb() ); +#endif #ifdef DS_sig DS_sig(sigcontext) = ds64_sel; #else @@ -1638,6 +1654,14 @@ __ASM_GLOBAL_FUNC( call_user_mode_callback, "jz 1f\n\t" "movw 0x338(%r8),%fs\n" /* amd64_thread_data()->fs */ "1:\n\t" +#endif +#ifdef __APPLE__ + "movq %rcx,%r10\n\t" + "movq %r8,%rdi\n\t" + "xorl %esi,%esi\n\t" + "movl $0x3000003,%eax\n\t" /* _thread_set_tsd_base */ + "syscall\n\t" + "movq %r10,%rcx\n\t" #endif "movq 0x348(%r8),%r10\n\t" /* amd64_thread_data()->instrumentation_callback */ "movq (%r10),%r10\n\t" @@ -1653,6 +1677,18 @@ __ASM_GLOBAL_FUNC( call_user_mode_callback, extern void DECLSPEC_NORETURN user_mode_callback_return( void *ret_ptr, ULONG ret_len, NTSTATUS status, TEB *teb ); __ASM_GLOBAL_FUNC( user_mode_callback_return, +#ifdef __APPLE__ + "movq %rcx,%r8\n\t" + "movq %rdi,%r9\n\t" + "movq %rsi,%r10\n\t" + "movq 0x320(%rcx),%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "xorl %esi,%esi\n\t" + "movl $0x3000003,%eax\n\t" /* _thread_set_tsd_base */ + "syscall\n\t" + "movq %r10,%rsi\n\t" + "movq %r9,%rdi\n\t" + "movq %r8,%rcx\n\t" +#endif "movq 0x328(%rcx),%r10\n\t" /* amd64_thread_data()->syscall_frame */ "movq 0xa0(%r10),%r11\n\t" /* frame->prev_frame */ "movq %r11,0x328(%rcx)\n\t" /* amd64_thread_data()->syscall_frame = prev_frame */ @@ -2571,13 +2607,7 @@ void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB #elif defined(__NetBSD__) sysarch( X86_64_SET_GSBASE, &teb ); #elif defined (__APPLE__) - __asm__ volatile ("movq %0,%%gs:%c1" :: "r" (teb->Tib.Self), "n" (FIELD_OFFSET(TEB, Tib.Self))); - __asm__ volatile ("movq %0,%%gs:%c1" :: "r" (teb->ThreadLocalStoragePointer), "n" (FIELD_OFFSET(TEB, ThreadLocalStoragePointer))); thread_data->pthread_teb = mac_thread_gsbase(); - /* alloc_tls_slot() needs to poke a value to an address relative to each - thread's gsbase. Have each thread record its gsbase pointer into its - TEB so alloc_tls_slot() can find it. */ - teb->Instrumentation[0] = thread_data->pthread_teb; #else # error Please define setting %gs for your architecture #endif @@ -2800,6 +2830,14 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "syscall\n\t" "leaq -0x98(%rbp),%rcx\n" "2:\n\t" +#endif +#ifdef __APPLE__ + "movq 0xb8(%rcx),%rdi\n\t" /* frame->teb */ + "movq 0x320(%rdi),%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "xorl %esi,%esi\n\t" + "movl $0x3000003,%eax\n\t" /* _thread_set_tsd_base */ + "syscall\n\t" + "leaq -0x98(%rbp),%rcx\n" #endif "movq 0x00(%rcx),%rax\n\t" "movq 0x18(%rcx),%r11\n\t" /* 2nd argument */ @@ -2842,6 +2880,16 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "jz 1f\n\t" "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ "1:\n\t" +#endif +#ifdef __APPLE__ + "movq %rax,%r8\n\t" + "movq %rcx,%rdx\n\t" + "movq 0xb8(%rcx),%rdi\n\t" /* frame->teb */ + "xorl %esi,%esi\n\t" + "movl $0x3000003,%eax\n\t" /* _thread_set_tsd_base */ + "syscall\n\t" + "movq %rdx,%rcx\n\t" + "movq %r8,%rax\n\t" #endif "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ "testl $0x48,%edx\n\t" /* CONTEXT_FLOATING_POINT | CONTEXT_XSTATE */ @@ -3013,6 +3061,10 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, __ASM_CFI_CFA_IS_AT2(rcx, 0x88, 0x01) "movq %rbp,0x98(%rcx)\n\t" __ASM_CFI_REG_IS_AT2(rbp, rcx, 0x98, 0x01) +#ifdef __APPLE__ + "movq %gs:0x30,%r14\n\t" + "movq %r14,0xb8(%rcx)\n\t" /* frame->teb */ +#endif "movdqa %xmm6,0x1c0(%rcx)\n\t" "movdqa %xmm7,0x1d0(%rcx)\n\t" "movdqa %xmm8,0x1e0(%rcx)\n\t" @@ -3050,6 +3102,12 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "mov $158,%eax\n\t" /* SYS_arch_prctl */ "syscall\n\t" "2:\n\t" +#endif +#ifdef __APPLE__ + "movq %gs:0x320,%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "xorl %esi,%esi\n\t" + "movl $0x3000003,%eax\n\t" /* _thread_set_tsd_base */ + "syscall\n\t" #endif "movq %r8,%rdi\n\t" /* args */ "callq *(%r10,%rdx,8)\n\t" @@ -3074,6 +3132,16 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "jz 1f\n\t" "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ "1:\n\t" +#endif +#ifdef __APPLE__ + "movq %rax,%rdx\n\t" + "movq %rcx,%r14\n\t" + "movq 0xb8(%rcx),%rdi\n\t" /* frame->teb */ + "xorl %esi,%esi\n\t" + "movl $0x3000003,%eax\n\t" /* _thread_set_tsd_base */ + "syscall\n\t" + "movq %r14,%rcx\n\t" + "movq %rdx,%rax\n\t" #endif "movq 0x60(%rcx),%r14\n\t" "movq 0x28(%rcx),%rdi\n\t"
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/unix/signal_x86_64.c | 31 ------------------------------- 1 file changed, 31 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 70b33ceb75c..1de512ec26e 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2715,12 +2715,7 @@ __ASM_GLOBAL_FUNC( signal_start_thread, * __wine_syscall_dispatcher */ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00) @@ -2760,12 +2755,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movl 0xb0(%rcx),%r14d\n\t" /* frame->syscall_flags */ "testl $3,%r14d\n\t" /* SYSCALL_HAVE_XSAVE | SYSCALL_HAVE_XSAVEC */ "jz 2f\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rdx\n\t" - "movl 0x340(%rdx),%eax\n\t" -#else "movl %gs:0x340,%eax\n\t" /* amd64_thread_data()->xstate_features_mask */ -#endif "xorl %edx,%edx\n\t" "andl $7,%eax\n\t" "xorq %rbp,%rbp\n\t" @@ -2908,14 +2898,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "2:\ttestl $3,%r14d\n\t" /* SYSCALL_HAVE_XSAVE | SYSCALL_HAVE_XSAVEC */ "jz 3f\n\t" "movq %rax,%r11\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rdx\n\t" - "movl 0x340(%rdx),%eax\n\t" - "movl 0x344(%rdx),%edx\n\t" -#else "movl %gs:0x340,%eax\n\t" /* amd64_thread_data()->xstate_features_mask */ "movl %gs:0x344,%edx\n\t" /* amd64_thread_data()->xstate_features_mask high dword */ -#endif "xrstor64 0xc0(%rcx)\n\t" "movq %r11,%rax\n\t" "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ @@ -2979,12 +2963,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x10(%rcx),%rcx\n\t" "iretq\n" /* RESTORE_FLAGS_INSTRUMENTATION */ -#ifdef __APPLE__ - "2:\tmovq %gs:0x30,%r10\n\t" - "movq 0x348(%r10),%r10\n\t" -#else "2:\tmovq %gs:0x348,%r10\n\t" /* amd64_thread_data()->instrumentation_callback */ -#endif "movq (%r10),%r10\n\t" "test %r10,%r10\n\t" "jz 3b\n\t" @@ -3010,12 +2989,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_return,
__ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_instrumentation, -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00) @@ -3032,12 +3006,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_instrumentation, */ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "movq %rcx,%r10\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00)
On Wed Apr 2 05:46:00 2025 +0000, Brendan Shanks wrote:
changed this line in [version 7 of the diff](/wine/wine/-/merge_requests/6866/diffs?diff_id=168023&start_sha=d2ebef1897b4089462d63bfdb20bae06058bc4bb#253a5ed0bf568582a6c96df257e6e01fe7220931_2873_2878)
I looked into it more and the syscall (`0x3000003`/`thread_fast_set_cthread_self64`) has actually existed with the same number for the entire life of x86_64 macOS, even if the higher-level thunks have changed over time. Searching for the number on GitHub even turns up a few obscure uses (including a [Wine fork](https://github.com/crioux/wine64-darwin) doing GS thunking 11 years ago!). The machdep syscalls have changed very little over time: only 4 added since 2007, and `thread_fast_set_cthread_self64` always kept the same number.
In all, it seems unlikely that the syscall number will change at this point (especially given that Intel macOS is essentially in maintenance mode). I don't really want to go down the path of trying to extract the number or depending on the machine code of `thread_set_tsd_base`.