x86_64 Windows and macOS both use `%gs` to access thread-specific data (Windows TEB, macOS TSD). To date, Wine has worked around this conflict by filling the most important TEB fields (`0x30`/`Self`, `0x58`/`ThreadLocalStorage`) in the macOS TSD structure (Apple reserved the fields for our use). This was sufficient for most Windows apps.
CrossOver's Wine had an additional hack to handle `0x60`/`ProcessEnvironmentBlock`, and binary patches for certain CEF binaries which directly accessed `0x8`/`StackBase`. Additionally, Apple's libd3dshared could activate a special mode in Rosetta 2 where code executing in certain regions would use the Windows TEB when accessing `%gs`.
Now that the PE separation is complete, GSBASE can be swapped when entering/exiting PE code. This is done in the syscall dispatcher, unix-call dispatcher, and for user-mode callbacks. GSBASE also needs to be set to the macOS TSD when entering signal handlers (in `init_handler()`), and then restored to the Windows TEB when exiting (in `leave_handler()`). There is a special-case needed in `usr1_handler`: when inside a syscall (on the kernel stack), GSBASE may need to be reset to either the TEB or the TSD. The only way to tell is to determine what GSBASE was set to on entry to the signal handler.
---
macOS does not have a public API for setting GSBASE, but the private `_thread_set_tsd_base()` works and was added in macOS 10.12.
`_thread_set_tsd_base()` is a small thunk that sets `%esi`, `%eax`, and does the `syscall`: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b.... The syscall instruction itself clobbers `%rcx` and `%r11`.
I've tried to save as few registers as possible when calling `_thread_set_tsd_base()`, but there may be room for improvement there.
---
I've tested this successfully on macOS 15 (Apple Silicon and Intel) with several apps and games, including the `cefclient.exe` CEF sample. I still need to test this patch on macOS 10.13, and I'd also like to do some performance testing.
---
I also tested an alternate implementation strategy for this which took advantage of the expanded "full" thread state which is passed to signal handlers when a process has set a user LDT. The full thread state includes GSBASE, so GSBASE is set back to whatever is in the sigcontext on return (like every other field in the context). This would avoid needing to explicitly reset GSBASE in `leave_handler()`, and avoid the special-case in `usr1_handler()`.
This strategy was simpler, but I'm not using it for 2 reasons: - the "full" thread state is only available starting with macOS 10.15, and we still support 10.13. - more crucially, Rosetta 2 doesn't seem to correctly implement the GS.base field of the full thread state. It's set to 0 on entry, and isn't read on exit.
-- v3: ntdll: Remove x86_64 Mac-specific TEB access workarounds that are no longer needed. ntdll: On macOS x86_64, swap GSBASE between the TEB and macOS TSD when entering/leaving PE code. ntdll: Ensure init_handler runs in signal handlers before any compiler-generated memset calls. ntdll: Remove ugly fallback method for getting a thread's GSBASE on macOS. ntdll: Leave kernel stack before accessing %gs in x86_64 syscall dispatcher. ntdll: Do %gs accesses before switching to kernel stack in x86_64 syscall dispatcher.
From: Brendan Shanks bshanks@codeweavers.com
In preparation for switching GSBASE on macOS. --- dlls/ntdll/unix/signal_x86_64.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index caa85249896..6784615eb07 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2793,6 +2793,16 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x28(%rsp),%r12\n\t" /* 5th argument */ "movq 0x30(%rsp),%r13\n\t" /* 6th argument */ "leaq 0x38(%rsp),%r15\n\t" /* 7th argument */ + /* %gs accesses must happen before switching to the kernel stack */ +#ifdef __linux__ + "movq %gs:0x320,%rsi\n\t" /* amd64_thread_data()->pthread_teb */ +#endif +#ifdef __APPLE__ + "movq %gs:0x30,%r11\n\t" + "movq 0x330(%r11),%r11\n\t" +#else + "movq %gs:0x330,%r11\n\t" /* amd64_thread_data()->syscall_table */ +#endif /* switch to kernel stack */ "movq %rcx,%rsp\n\t" /* we're now on the kernel stack, stitch unwind info with previous frame */ @@ -2809,7 +2819,6 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, #ifdef __linux__ "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ "jz 2f\n\t" - "movq %gs:0x320,%rsi\n\t" /* amd64_thread_data()->pthread_teb */ "testl $8,%r14d\n\t" /* SYSCALL_HAVE_WRFSGSBASE */ "jz 1f\n\t" "wrfsbase %rsi\n\t" @@ -2821,17 +2830,10 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "2:\n\t" #endif "movq 0x00(%rcx),%rax\n\t" - "movq 0x18(%rcx),%r11\n\t" /* 2nd argument */ "movl %eax,%ebx\n\t" "shrl $8,%ebx\n\t" "andl $0x30,%ebx\n\t" /* syscall table number */ -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x330(%rcx),%rcx\n\t" -#else - "movq %gs:0x330,%rcx\n\t" /* amd64_thread_data()->syscall_table */ -#endif - "leaq (%rcx,%rbx,2),%rbx\n\t" + "leaq (%r11,%rbx,2),%rbx\n\t" "andl $0xfff,%eax\n\t" /* syscall number */ "cmpq 16(%rbx),%rax\n\t" /* table->ServiceLimit */ "jae 5f\n\t" @@ -2846,8 +2848,9 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq %r15,%rsi\n\t" "cld\n\t" "rep; movsq\n" - "1:\tmovq %r10,%rdi\n\t" /* 1st argument */ - "movq %r11,%rsi\n\t" /* 2nd argument */ + "1:\tleaq -0x98(%rbp),%rcx\n\t" + "movq %r10,%rdi\n\t" /* 1st argument */ + "movq 0x18(%rcx),%rsi\n\t" /* 2nd argument */ "movq %r8,%rdx\n\t" /* 3rd argument */ "movq %r9,%rcx\n\t" /* 4th argument */ "movq %r12,%r8\n\t" /* 5th argument */
From: Brendan Shanks bshanks@codeweavers.com
In preparation for switching GSBASE on macOS. --- dlls/ntdll/unix/signal_x86_64.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 6784615eb07..eacb78bb854 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2859,6 +2859,16 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "callq *(%r10,%rax,8)\n\t" "leaq -0x98(%rbp),%rcx\n\t" __ASM_LOCAL_LABEL("__wine_syscall_dispatcher_return") ":\n\t" + /* push rbp-based kernel stack cfi */ + __ASM_CFI(".cfi_remember_state\n\t") + __ASM_CFI_CFA_IS_AT2(rcx, 0xa8, 0x01) /* frame->syscall_cfa */ + "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ +#ifdef __linux__ + "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ + "jz 1f\n\t" + "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ + "1:\n\t" +#endif "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ "testl $0x48,%edx\n\t" /* CONTEXT_FLOATING_POINT | CONTEXT_XSTATE */ "jnz 2f\n\t" @@ -2890,9 +2900,6 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "jmp 4f\n" "3:\tfxrstor64 0xc0(%rcx)\n" "4:\tmovq 0x98(%rcx),%rbp\n\t" - /* push rbp-based kernel stack cfi */ - __ASM_CFI(".cfi_remember_state\n\t") - __ASM_CFI_CFA_IS_AT2(rcx, 0xa8, 0x01) /* frame->syscall_cfa */ "movq 0x68(%rcx),%r15\n\t" "movq 0x58(%rcx),%r13\n\t" "movq 0x50(%rcx),%r12\n\t" @@ -2900,13 +2907,6 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x28(%rcx),%rdi\n\t" "movq 0x20(%rcx),%rsi\n\t" "movq 0x08(%rcx),%rbx\n\t" - "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ -#ifdef __linux__ - "testl $12,%r14d\n\t" /* SYSCALL_HAVE_PTHREAD_TEB | SYSCALL_HAVE_WRFSGSBASE */ - "jz 1f\n\t" - "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ - "1:\n\t" -#endif "testl $0x10000,%edx\n\t" /* RESTORE_FLAGS_INSTRUMENTATION */ "movq 0x60(%rcx),%r14\n\t" "jnz 2f\n\t"
From: Brendan Shanks bshanks@codeweavers.com
84760a8fb2cf9ed577c63957c5bdfc621d748a7f started using the documented Mach API method. --- dlls/ntdll/unix/signal_x86_64.c | 38 --------------------------------- 1 file changed, 38 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index eacb78bb854..d0f793be6c8 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2428,50 +2428,12 @@ static void *mac_thread_gsbase(void) { struct thread_identifier_info tiinfo; unsigned int info_count = THREAD_IDENTIFIER_INFO_COUNT; - static int gsbase_offset = -1;
mach_port_t self = mach_thread_self(); kern_return_t kr = thread_info(self, THREAD_IDENTIFIER_INFO, (thread_info_t) &tiinfo, &info_count); mach_port_deallocate(mach_task_self(), self);
if (kr == KERN_SUCCESS) return (void*)tiinfo.thread_handle; - - if (gsbase_offset < 0) - { - /* Search for the array of TLS slots within the pthread data structure. - That's what the macOS pthread implementation uses for gsbase. */ - const void* const sentinel1 = (const void*)0x2bffb6b4f11228ae; - const void* const sentinel2 = (const void*)0x0845a7ff6ab76707; - int rc; - pthread_key_t key; - const void** p = (const void**)pthread_self(); - int i; - - gsbase_offset = 0; - if ((rc = pthread_key_create(&key, NULL))) return NULL; - - pthread_setspecific(key, sentinel1); - - for (i = key + 1; i < 2000; i++) /* arbitrary limit */ - { - if (p[i] == sentinel1) - { - pthread_setspecific(key, sentinel2); - - if (p[i] == sentinel2) - { - gsbase_offset = (i - key) * sizeof(*p); - break; - } - - pthread_setspecific(key, sentinel1); - } - } - - pthread_key_delete(key); - } - - if (gsbase_offset) return (char*)pthread_self() + gsbase_offset; return NULL; } #endif
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/unix/signal_x86_64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index d0f793be6c8..5288013f2a6 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -1974,9 +1974,9 @@ static BOOL handle_syscall_trap( ucontext_t *sigcontext, siginfo_t *siginfo ) */ static void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { + ucontext_t *ucontext = init_handler( sigcontext ); EXCEPTION_RECORD rec = { 0 }; struct xcontext context; - ucontext_t *ucontext = init_handler( sigcontext );
rec.ExceptionAddress = (void *)RIP_sig(ucontext); save_context( &context, ucontext ); @@ -2059,9 +2059,9 @@ static void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) */ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { + ucontext_t *ucontext = init_handler( sigcontext ); EXCEPTION_RECORD rec = { 0 }; struct xcontext context; - ucontext_t *ucontext = init_handler( sigcontext );
if (handle_syscall_trap( ucontext, siginfo )) return;
@@ -2093,8 +2093,8 @@ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext ) */ static void fpe_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { - EXCEPTION_RECORD rec = { 0 }; ucontext_t *ucontext = init_handler( sigcontext ); + EXCEPTION_RECORD rec = { 0 };
switch (siginfo->si_code) {
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/loader.c | 11 ----- dlls/ntdll/unix/signal_x86_64.c | 85 +++++++++++++++++++++++++++++---- 2 files changed, 75 insertions(+), 21 deletions(-)
diff --git a/dlls/ntdll/loader.c b/dlls/ntdll/loader.c index 38516115b38..5267dec6647 100644 --- a/dlls/ntdll/loader.c +++ b/dlls/ntdll/loader.c @@ -1403,9 +1403,6 @@ static BOOL alloc_tls_slot( LDR_DATA_TABLE_ENTRY *mod ) if (!new) return FALSE; if (old) memcpy( new, old, old_module_count * sizeof(*new) ); teb->ThreadLocalStoragePointer = new; -#ifdef __x86_64__ /* macOS-specific hack */ - if (teb->Instrumentation[0]) ((TEB *)teb->Instrumentation[0])->ThreadLocalStoragePointer = new; -#endif TRACE( "thread %04lx tls block %p -> %p\n", HandleToULong(teb->ClientId.UniqueThread), old, new ); /* FIXME: can't free old block here, should be freed at thread exit */ } @@ -1657,10 +1654,6 @@ static NTSTATUS alloc_thread_tls(void) TRACE( "slot %u: %u/%lu bytes at %p\n", i, size, dir->SizeOfZeroFill, pointers[i] ); } NtCurrentTeb()->ThreadLocalStoragePointer = pointers; -#ifdef __x86_64__ /* macOS-specific hack */ - if (NtCurrentTeb()->Instrumentation[0]) - ((TEB *)NtCurrentTeb()->Instrumentation[0])->ThreadLocalStoragePointer = pointers; -#endif return STATUS_SUCCESS; }
@@ -3947,10 +3940,6 @@ void WINAPI LdrShutdownThread(void) if ((pointers = NtCurrentTeb()->ThreadLocalStoragePointer)) { NtCurrentTeb()->ThreadLocalStoragePointer = NULL; -#ifdef __x86_64__ /* macOS-specific hack */ - if (NtCurrentTeb()->Instrumentation[0]) - ((TEB *)NtCurrentTeb()->Instrumentation[0])->ThreadLocalStoragePointer = NULL; -#endif for (i = 0; i < tls_module_count; i++) RtlFreeHeap( GetProcessHeap(), 0, pointers[i] ); RtlFreeHeap( GetProcessHeap(), 0, pointers ); } diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 5288013f2a6..2b0209f5f9d 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -63,6 +63,7 @@ #endif #ifdef __APPLE__ # include <mach/mach.h> +extern void _thread_set_tsd_base(uint64_t); #endif
#include "ntstatus.h" @@ -423,7 +424,7 @@ struct syscall_frame void *syscall_cfa; /* 00a8 */ DWORD syscall_flags; /* 00b0 */ DWORD restore_flags; /* 00b4 */ - DWORD align[2]; /* 00b8 */ + ULONG64 gsbase; /* 00b8 */ XMM_SAVE_AREA32 xsave; /* 00c0 */ DECLSPEC_ALIGN(64) XSAVE_AREA_HEADER xstate; /* 02c0 */ }; @@ -462,7 +463,7 @@ static inline struct amd64_thread_data *amd64_thread_data(void) return (struct amd64_thread_data *)ntdll_get_thread_data()->cpu_data; }
-#ifdef __linux__ +#if defined(__linux__) || defined(__APPLE__) static inline TEB *get_current_teb(void) { unsigned long rsp; @@ -846,6 +847,10 @@ static inline ucontext_t *init_handler( void *sigcontext ) struct ntdll_thread_data *thread_data = (struct ntdll_thread_data *)&get_current_teb()->GdiTebBatch; arch_prctl( ARCH_SET_FS, ((struct amd64_thread_data *)thread_data->cpu_data)->pthread_teb ); } +#endif +#ifdef __APPLE__ + struct ntdll_thread_data *thread_data = (struct ntdll_thread_data *)&get_current_teb()->GdiTebBatch; + _thread_set_tsd_base( (uint64_t)((struct amd64_thread_data *)thread_data->cpu_data)->pthread_teb ); #endif return sigcontext; } @@ -860,6 +865,10 @@ static inline void leave_handler( ucontext_t *sigcontext ) if (fs32_sel && !is_inside_signal_stack( (void *)RSP_sig(sigcontext )) && !is_inside_syscall(sigcontext)) __asm__ volatile( "movw %0,%%fs" :: "r" (fs32_sel) ); #endif +#ifdef __APPLE__ + if (!is_inside_signal_stack( (void *)RSP_sig(sigcontext )) && !is_inside_syscall(sigcontext)) + _thread_set_tsd_base( (uint64_t)NtCurrentTeb() ); +#endif #ifdef DS_sig DS_sig(sigcontext) = ds64_sel; #else @@ -1644,6 +1653,12 @@ __ASM_GLOBAL_FUNC( call_user_mode_callback, "test %r10,%r10\n\t" "jz 1f\n\t" "xchgq %rcx,%r10\n\t" +#ifdef __APPLE__ + "1\t:pushq %rcx\n\t" + "movq %r8,%rdi\n\t" + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rcx\n\t" +#endif "1\t:jmpq *%rcx" ) /* func */
@@ -1653,6 +1668,16 @@ __ASM_GLOBAL_FUNC( call_user_mode_callback, extern void DECLSPEC_NORETURN user_mode_callback_return( void *ret_ptr, ULONG ret_len, NTSTATUS status, TEB *teb ); __ASM_GLOBAL_FUNC( user_mode_callback_return, +#ifdef __APPLE__ + "pushq %rcx\n\t" + "pushq %rdi\n\t" + "pushq %rsi\n\t" + "movq 0x320(%rcx),%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rsi\n\t" + "popq %rdi\n\t" + "popq %rcx\n\t" +#endif "movq 0x328(%rcx),%r10\n\t" /* amd64_thread_data()->syscall_frame */ "movq 0xa0(%r10),%r11\n\t" /* frame->prev_frame */ "movq %r11,0x328(%rcx)\n\t" /* amd64_thread_data()->syscall_frame = prev_frame */ @@ -2559,13 +2584,7 @@ void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB #elif defined(__NetBSD__) sysarch( X86_64_SET_GSBASE, &teb ); #elif defined (__APPLE__) - __asm__ volatile ("movq %0,%%gs:%c1" :: "r" (teb->Tib.Self), "n" (FIELD_OFFSET(TEB, Tib.Self))); - __asm__ volatile ("movq %0,%%gs:%c1" :: "r" (teb->ThreadLocalStoragePointer), "n" (FIELD_OFFSET(TEB, ThreadLocalStoragePointer))); thread_data->pthread_teb = mac_thread_gsbase(); - /* alloc_tls_slot() needs to poke a value to an address relative to each - thread's gsbase. Have each thread record its gsbase pointer into its - TEB so alloc_tls_slot() can find it. */ - teb->Instrumentation[0] = thread_data->pthread_teb; #else # error Please define setting %gs for your architecture #endif @@ -2622,6 +2641,7 @@ void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB frame->restore_flags |= CONTEXT_INTEGER; frame->syscall_flags = syscall_flags; frame->syscall_cfa = syscall_cfa; + frame->gsbase = (ULONG64)teb; if ((callback = instrumentation_callback)) { frame->r10 = frame->rip; @@ -2709,6 +2729,10 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movw %ss,0x90(%rcx)\n\t" "movq %rbp,0x98(%rcx)\n\t" __ASM_CFI_REG_IS_AT2(rbp, rcx, 0x98, 0x01) +#ifdef __APPLE__ + "movq %gs:0x30,%r14\n\t" + "movq %r14,0xb8(%rcx)\n\t" /* frame->gsbase */ +#endif /* Legends of Runeterra hooks the first system call return instruction, and * depends on us returning to it. Adjust the return address accordingly. */ "subq $0xb,0x70(%rcx)\n\t" @@ -2756,9 +2780,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x30(%rsp),%r13\n\t" /* 6th argument */ "leaq 0x38(%rsp),%r15\n\t" /* 7th argument */ /* %gs accesses must happen before switching to the kernel stack */ -#ifdef __linux__ "movq %gs:0x320,%rsi\n\t" /* amd64_thread_data()->pthread_teb */ -#endif #ifdef __APPLE__ "movq %gs:0x30,%r11\n\t" "movq 0x330(%r11),%r11\n\t" @@ -2790,6 +2812,13 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "syscall\n\t" "leaq -0x98(%rbp),%rcx\n" "2:\n\t" +#endif +#ifdef __APPLE__ + "pushq %r11\n\t" + "movq %rsi,%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %r11\n\t" + "leaq -0x98(%rbp),%rcx\n" #endif "movq 0x00(%rcx),%rax\n\t" "movl %eax,%ebx\n\t" @@ -2830,6 +2859,16 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "jz 1f\n\t" "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ "1:\n\t" +#endif +#ifdef __APPLE__ + "movq 0x88(%rcx),%rsp\n\t" /* use the user stack for this call */ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "movq 0xb8(%rcx),%rdi\n\t" /* frame->gsbase */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" + "leaq 0x70(%rcx),%rsp\n\t" /* %rsp > frame means no longer inside syscall */ #endif "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ "testl $0x48,%edx\n\t" /* CONTEXT_FLOATING_POINT | CONTEXT_XSTATE */ @@ -2995,6 +3034,10 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, __ASM_CFI_CFA_IS_AT2(rcx, 0x88, 0x01) "movq %rbp,0x98(%rcx)\n\t" __ASM_CFI_REG_IS_AT2(rbp, rcx, 0x98, 0x01) +#ifdef __APPLE__ + "movq %gs:0x30,%r14\n\t" + "movq %r14,0xb8(%rcx)\n\t" /* frame->gsbase */ +#endif "movdqa %xmm6,0x1c0(%rcx)\n\t" "movdqa %xmm7,0x1d0(%rcx)\n\t" "movdqa %xmm8,0x1e0(%rcx)\n\t" @@ -3032,6 +3075,18 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "mov $158,%eax\n\t" /* SYS_arch_prctl */ "syscall\n\t" "2:\n\t" +#endif +#ifdef __APPLE__ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "pushq %rdx\n\t" + "pushq %rsi\n\t" + "movq %gs:0x320,%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rsi\n\t" + "popq %rdx\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" #endif "movq %r8,%rdi\n\t" /* args */ "callq *(%r10,%rdx,8)\n\t" @@ -3056,6 +3111,16 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "jz 1f\n\t" "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ "1:\n\t" +#endif +#ifdef __APPLE__ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "pushq %rdx\n\t" + "movq 0xb8(%rcx),%rdi\n\t" /* frame->gsbase */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rdx\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" #endif "movq 0x60(%rcx),%r14\n\t" "movq 0x28(%rcx),%rdi\n\t"
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/unix/signal_x86_64.c | 36 --------------------------------- 1 file changed, 36 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 2b0209f5f9d..09afc967b93 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2692,12 +2692,7 @@ __ASM_GLOBAL_FUNC( signal_start_thread, * __wine_syscall_dispatcher */ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00) @@ -2739,12 +2734,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movl 0xb0(%rcx),%r14d\n\t" /* frame->syscall_flags */ "testl $3,%r14d\n\t" /* SYSCALL_HAVE_XSAVE | SYSCALL_HAVE_XSAVEC */ "jz 2f\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rdx\n\t" - "movl 0x340(%rdx),%eax\n\t" -#else "movl %gs:0x340,%eax\n\t" /* amd64_thread_data()->xstate_features_mask */ -#endif "xorl %edx,%edx\n\t" "andl $7,%eax\n\t" "xorq %rbp,%rbp\n\t" @@ -2781,12 +2771,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "leaq 0x38(%rsp),%r15\n\t" /* 7th argument */ /* %gs accesses must happen before switching to the kernel stack */ "movq %gs:0x320,%rsi\n\t" /* amd64_thread_data()->pthread_teb */ -#ifdef __APPLE__ - "movq %gs:0x30,%r11\n\t" - "movq 0x330(%r11),%r11\n\t" -#else "movq %gs:0x330,%r11\n\t" /* amd64_thread_data()->syscall_table */ -#endif /* switch to kernel stack */ "movq %rcx,%rsp\n\t" /* we're now on the kernel stack, stitch unwind info with previous frame */ @@ -2887,14 +2872,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "2:\ttestl $3,%r14d\n\t" /* SYSCALL_HAVE_XSAVE | SYSCALL_HAVE_XSAVEC */ "jz 3f\n\t" "movq %rax,%r11\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rdx\n\t" - "movl 0x340(%rdx),%eax\n\t" - "movl 0x344(%rdx),%edx\n\t" -#else "movl %gs:0x340,%eax\n\t" /* amd64_thread_data()->xstate_features_mask */ "movl %gs:0x344,%edx\n\t" /* amd64_thread_data()->xstate_features_mask high dword */ -#endif "xrstor64 0xc0(%rcx)\n\t" "movq %r11,%rax\n\t" "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ @@ -2955,12 +2934,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x10(%rcx),%rcx\n\t" "iretq\n" /* RESTORE_FLAGS_INSTRUMENTATION */ -#ifdef __APPLE__ - "2:\tmovq %gs:0x30,%r10\n\t" - "movq 0x348(%r10),%r10\n\t" -#else "2:\tmovq %gs:0x348,%r10\n\t" /* amd64_thread_data()->instrumentation_callback */ -#endif "movq (%r10),%r10\n\t" "test %r10,%r10\n\t" "jz 3b\n\t" @@ -2983,12 +2957,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_return,
__ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_instrumentation, -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00) @@ -3005,12 +2974,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_instrumentation, */ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "movq %rcx,%r10\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00)
Hi,
It looks like your patch introduced the new failures shown below. Please investigate and fix them before resubmitting your patch. If they are not new, fixing them anyway would help a lot. Otherwise please ask for the known failures list to be updated.
The tests also ran into some preexisting test failures. If you know how to fix them that would be helpful. See the TestBot job for the details:
The full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=150248
Your paranoid android.
=== debian11b (64 bit WoW report) ===
user32: win.c:4070: Test failed: Expected active window 0000000005FD015E, got 0000000000000000. win.c:4071: Test failed: Expected focus window 0000000005FD015E, got 0000000000000000.
Report validation errors: mshtml:script crashed (c0000005)
It feels like we're shuffling around gs accesses a bit too much, spilling registers into stack in the process.
Instead, I think we can just use gs once for `movq %gs:0x328,%rcx`, and use `struct syscall_frame.gsbase (0xb8)` for subsequent TEB accesses[^1]. Let's rename the `gsbase` field to `teb` to better reflect its nature.
[^1]: My empirical experience tells me that stack memory access is much faster than TLS access (e.g., %gs:OFFSET) anyway. Correct me if I'm wrong, though.