x86_64 Windows and macOS both use `%gs` to access thread-specific data (Windows TEB, macOS TSD). To date, Wine has worked around this conflict by filling the most important TEB fields (`0x30`/`Self`, `0x58`/`ThreadLocalStorage`) in the macOS TSD structure (Apple reserved the fields for our use). This was sufficient for most Windows apps.
CrossOver's Wine had an additional hack to handle `0x60`/`ProcessEnvironmentBlock`, and binary patches for certain CEF binaries which directly accessed `0x8`/`StackBase`. Additionally, Apple's libd3dshared could activate a special mode in Rosetta 2 where code executing in certain regions would use the Windows TEB when accessing `%gs`.
Now that the PE separation is complete, GSBASE can be swapped when entering/exiting PE code. This is done in the syscall dispatcher, unix-call dispatcher, and for user-mode callbacks. GSBASE also needs to be set to the macOS TSD when entering signal handlers (in `init_handler()`), and then restored to the Windows TEB when exiting (in `leave_handler()`). There is a special-case needed in `usr1_handler`: when inside a syscall (on the kernel stack), GSBASE may need to be reset to either the TEB or the TSD. The only way to tell is to determine what GSBASE was set to on entry to the signal handler.
---
macOS does not have a public API for setting GSBASE, but the private `_thread_set_tsd_base()` works and was added in macOS 10.12.
`_thread_set_tsd_base()` is a small thunk that sets `%esi`, `%eax`, and does the `syscall`: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b.... The syscall instruction itself clobbers `%rcx` and `%r11`.
I've tried to save as few registers as possible when calling `_thread_set_tsd_base()`, but there may be room for improvement there.
---
I've tested this successfully on macOS 15 (Apple Silicon and Intel) with several apps and games, including the `cefclient.exe` CEF sample. I still need to test this patch on macOS 10.13, and I'd also like to do some performance testing.
---
I also tested an alternate implementation strategy for this which took advantage of the expanded "full" thread state which is passed to signal handlers when a process has set a user LDT. The full thread state includes GSBASE, so GSBASE is set back to whatever is in the sigcontext on return (like every other field in the context). This would avoid needing to explicitly reset GSBASE in `leave_handler()`, and avoid the special-case in `usr1_handler()`.
This strategy was simpler, but I'm not using it for 2 reasons: - the "full" thread state is only available starting with macOS 10.15, and we still support 10.13. - more crucially, Rosetta 2 doesn't seem to correctly implement the GS.base field of the full thread state. It's set to 0 on entry, and isn't read on exit.
From: Brendan Shanks bshanks@codeweavers.com
84760a8fb2cf9ed577c63957c5bdfc621d748a7f started using the documented Mach API method. --- dlls/ntdll/unix/signal_x86_64.c | 38 --------------------------------- 1 file changed, 38 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 537e4e1f60e..2b4abe555f7 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2428,50 +2428,12 @@ static void *mac_thread_gsbase(void) { struct thread_identifier_info tiinfo; unsigned int info_count = THREAD_IDENTIFIER_INFO_COUNT; - static int gsbase_offset = -1;
mach_port_t self = mach_thread_self(); kern_return_t kr = thread_info(self, THREAD_IDENTIFIER_INFO, (thread_info_t) &tiinfo, &info_count); mach_port_deallocate(mach_task_self(), self);
if (kr == KERN_SUCCESS) return (void*)tiinfo.thread_handle; - - if (gsbase_offset < 0) - { - /* Search for the array of TLS slots within the pthread data structure. - That's what the macOS pthread implementation uses for gsbase. */ - const void* const sentinel1 = (const void*)0x2bffb6b4f11228ae; - const void* const sentinel2 = (const void*)0x0845a7ff6ab76707; - int rc; - pthread_key_t key; - const void** p = (const void**)pthread_self(); - int i; - - gsbase_offset = 0; - if ((rc = pthread_key_create(&key, NULL))) return NULL; - - pthread_setspecific(key, sentinel1); - - for (i = key + 1; i < 2000; i++) /* arbitrary limit */ - { - if (p[i] == sentinel1) - { - pthread_setspecific(key, sentinel2); - - if (p[i] == sentinel2) - { - gsbase_offset = (i - key) * sizeof(*p); - break; - } - - pthread_setspecific(key, sentinel1); - } - } - - pthread_key_delete(key); - } - - if (gsbase_offset) return (char*)pthread_self() + gsbase_offset; return NULL; } #endif
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/unix/signal_x86_64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 2b4abe555f7..744273a2b56 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -1974,9 +1974,9 @@ static BOOL handle_syscall_trap( ucontext_t *sigcontext, siginfo_t *siginfo ) */ static void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { + ucontext_t *ucontext = init_handler( sigcontext ); EXCEPTION_RECORD rec = { 0 }; struct xcontext context; - ucontext_t *ucontext = init_handler( sigcontext );
rec.ExceptionAddress = (void *)RIP_sig(ucontext); save_context( &context, ucontext ); @@ -2059,9 +2059,9 @@ static void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) */ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { + ucontext_t *ucontext = init_handler( sigcontext ); EXCEPTION_RECORD rec = { 0 }; struct xcontext context; - ucontext_t *ucontext = init_handler( sigcontext );
if (handle_syscall_trap( ucontext, siginfo )) return;
@@ -2093,8 +2093,8 @@ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext ) */ static void fpe_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { - EXCEPTION_RECORD rec = { 0 }; ucontext_t *ucontext = init_handler( sigcontext ); + EXCEPTION_RECORD rec = { 0 };
switch (siginfo->si_code) {
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/loader.c | 11 --- dlls/ntdll/unix/signal_x86_64.c | 115 +++++++++++++++++++++++++++++--- 2 files changed, 107 insertions(+), 19 deletions(-)
diff --git a/dlls/ntdll/loader.c b/dlls/ntdll/loader.c index 38516115b38..5267dec6647 100644 --- a/dlls/ntdll/loader.c +++ b/dlls/ntdll/loader.c @@ -1403,9 +1403,6 @@ static BOOL alloc_tls_slot( LDR_DATA_TABLE_ENTRY *mod ) if (!new) return FALSE; if (old) memcpy( new, old, old_module_count * sizeof(*new) ); teb->ThreadLocalStoragePointer = new; -#ifdef __x86_64__ /* macOS-specific hack */ - if (teb->Instrumentation[0]) ((TEB *)teb->Instrumentation[0])->ThreadLocalStoragePointer = new; -#endif TRACE( "thread %04lx tls block %p -> %p\n", HandleToULong(teb->ClientId.UniqueThread), old, new ); /* FIXME: can't free old block here, should be freed at thread exit */ } @@ -1657,10 +1654,6 @@ static NTSTATUS alloc_thread_tls(void) TRACE( "slot %u: %u/%lu bytes at %p\n", i, size, dir->SizeOfZeroFill, pointers[i] ); } NtCurrentTeb()->ThreadLocalStoragePointer = pointers; -#ifdef __x86_64__ /* macOS-specific hack */ - if (NtCurrentTeb()->Instrumentation[0]) - ((TEB *)NtCurrentTeb()->Instrumentation[0])->ThreadLocalStoragePointer = pointers; -#endif return STATUS_SUCCESS; }
@@ -3947,10 +3940,6 @@ void WINAPI LdrShutdownThread(void) if ((pointers = NtCurrentTeb()->ThreadLocalStoragePointer)) { NtCurrentTeb()->ThreadLocalStoragePointer = NULL; -#ifdef __x86_64__ /* macOS-specific hack */ - if (NtCurrentTeb()->Instrumentation[0]) - ((TEB *)NtCurrentTeb()->Instrumentation[0])->ThreadLocalStoragePointer = NULL; -#endif for (i = 0; i < tls_module_count; i++) RtlFreeHeap( GetProcessHeap(), 0, pointers[i] ); RtlFreeHeap( GetProcessHeap(), 0, pointers ); } diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 744273a2b56..e90e2042275 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -63,6 +63,7 @@ #endif #ifdef __APPLE__ # include <mach/mach.h> +extern void _thread_set_tsd_base(uint64_t); #endif
#include "ntstatus.h" @@ -423,7 +424,7 @@ struct syscall_frame void *syscall_cfa; /* 00a8 */ DWORD syscall_flags; /* 00b0 */ DWORD restore_flags; /* 00b4 */ - DWORD align[2]; /* 00b8 */ + ULONG64 gsbase; /* 00b8 */ XMM_SAVE_AREA32 xsave; /* 00c0 */ DECLSPEC_ALIGN(64) XSAVE_AREA_HEADER xstate; /* 02c0 */ }; @@ -462,7 +463,7 @@ static inline struct amd64_thread_data *amd64_thread_data(void) return (struct amd64_thread_data *)ntdll_get_thread_data()->cpu_data; }
-#ifdef __linux__ +#if defined(__linux__) || defined(__APPLE__) static inline TEB *get_current_teb(void) { unsigned long rsp; @@ -477,6 +478,15 @@ static BOOL is_inside_syscall( const ucontext_t *sigcontext ) (char *)RSP_sig(sigcontext) <= (char *)amd64_thread_data()->syscall_frame); }
+#ifdef __APPLE__ +static inline UINT64 get_gs30(void) +{ + UINT64 gs30; + __asm__("movq %%gs:0x30,%0" : "=r" (gs30)); + return gs30; +} +#endif +
extern void __wine_syscall_dispatcher_instrumentation(void); static void *instrumentation_callback; @@ -846,6 +856,10 @@ static inline ucontext_t *init_handler( void *sigcontext ) struct ntdll_thread_data *thread_data = (struct ntdll_thread_data *)&get_current_teb()->GdiTebBatch; arch_prctl( ARCH_SET_FS, ((struct amd64_thread_data *)thread_data->cpu_data)->pthread_teb ); } +#endif +#ifdef __APPLE__ + struct ntdll_thread_data *thread_data = (struct ntdll_thread_data *)&get_current_teb()->GdiTebBatch; + _thread_set_tsd_base( (uint64_t)((struct amd64_thread_data *)thread_data->cpu_data)->pthread_teb ); #endif return sigcontext; } @@ -860,6 +874,9 @@ static inline void leave_handler( ucontext_t *sigcontext ) if (fs32_sel && !is_inside_signal_stack( (void *)RSP_sig(sigcontext )) && !is_inside_syscall(sigcontext)) __asm__ volatile( "movw %0,%%fs" :: "r" (fs32_sel) ); #endif +#ifdef __APPLE__ + _thread_set_tsd_base( (uint64_t)NtCurrentTeb() ); +#endif #ifdef DS_sig DS_sig(sigcontext) = ds64_sel; #else @@ -1644,6 +1661,12 @@ __ASM_GLOBAL_FUNC( call_user_mode_callback, "test %r10,%r10\n\t" "jz 1f\n\t" "xchgq %rcx,%r10\n\t" +#ifdef __APPLE__ + "1\t:pushq %rcx\n\t" + "movq %r8,%rdi\n\t" + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rcx\n\t" +#endif "1\t:jmpq *%rcx" ) /* func */
@@ -1653,6 +1676,16 @@ __ASM_GLOBAL_FUNC( call_user_mode_callback, extern void DECLSPEC_NORETURN user_mode_callback_return( void *ret_ptr, ULONG ret_len, NTSTATUS status, TEB *teb ); __ASM_GLOBAL_FUNC( user_mode_callback_return, +#ifdef __APPLE__ + "pushq %rcx\n\t" + "pushq %rdi\n\t" + "pushq %rsi\n\t" + "movq 0x320(%rcx),%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rsi\n\t" + "popq %rdi\n\t" + "popq %rcx\n\t" +#endif "movq 0x328(%rcx),%r10\n\t" /* amd64_thread_data()->syscall_frame */ "movq 0xa0(%r10),%r11\n\t" /* frame->prev_frame */ "movq %r11,0x328(%rcx)\n\t" /* amd64_thread_data()->syscall_frame = prev_frame */ @@ -2194,6 +2227,9 @@ static void quit_handler( int signal, siginfo_t *siginfo, void *sigcontext ) */ static void usr1_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { +#ifdef __APPLE__ + UINT64 gs30 = get_gs30(); +#endif ucontext_t *ucontext = init_handler( sigcontext );
if (is_inside_syscall( ucontext )) @@ -2225,6 +2261,14 @@ static void usr1_handler( int signal, siginfo_t *siginfo, void *sigcontext ) frame->restore_flags |= 0x40; } NtSetContextThread( GetCurrentThread(), &context->c ); +#ifdef __APPLE__ + /* GSBASE needs to be reset to its value at the entry of this function, + * which could be either the TEB or Mac TSD. + * Check %gs:0x30, which is Self in the TEB and reserved (0) in the TSD. + */ + if (gs30) + _thread_set_tsd_base( (uint64_t)NtCurrentTeb() ); +#endif } else { @@ -2559,13 +2603,7 @@ void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB #elif defined(__NetBSD__) sysarch( X86_64_SET_GSBASE, &teb ); #elif defined (__APPLE__) - __asm__ volatile (".byte 0x65\n\tmovq %0,%c1" :: "r" (teb->Tib.Self), "n" (FIELD_OFFSET(TEB, Tib.Self))); - __asm__ volatile (".byte 0x65\n\tmovq %0,%c1" :: "r" (teb->ThreadLocalStoragePointer), "n" (FIELD_OFFSET(TEB, ThreadLocalStoragePointer))); thread_data->pthread_teb = mac_thread_gsbase(); - /* alloc_tls_slot() needs to poke a value to an address relative to each - thread's gsbase. Have each thread record its gsbase pointer into its - TEB so alloc_tls_slot() can find it. */ - teb->Instrumentation[0] = thread_data->pthread_teb; #else # error Please define setting %gs for your architecture #endif @@ -2622,6 +2660,7 @@ void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB frame->restore_flags |= CONTEXT_INTEGER; frame->syscall_flags = syscall_flags; frame->syscall_cfa = syscall_cfa; + frame->gsbase = (ULONG64)teb; if ((callback = instrumentation_callback)) { frame->r10 = frame->rip; @@ -2709,6 +2748,10 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movw %ss,0x90(%rcx)\n\t" "movq %rbp,0x98(%rcx)\n\t" __ASM_CFI_REG_IS_AT2(rbp, rcx, 0x98, 0x01) +#ifdef __APPLE__ + "movq %gs:0x30,%r14\n\t" + "movq %r14,0xb8(%rcx)\n\t" /* frame->gsbase */ +#endif /* Legends of Runeterra hooks the first system call return instruction, and * depends on us returning to it. Adjust the return address accordingly. */ "subq $0xb,0x70(%rcx)\n\t" @@ -2797,6 +2840,22 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "andl $0xfff,%eax\n\t" /* syscall number */ "cmpq 16(%rbx),%rax\n\t" /* table->ServiceLimit */ "jae 5f\n\t" +#ifdef __APPLE__ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "pushq %rdi\n\t" + "pushq %rdx\n\t" + "pushq %rsi\n\t" + "pushq %r11\n\t" + "movq %gs:0x320,%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %r11\n\t" + "popq %rsi\n\t" + "popq %rdx\n\t" + "popq %rdi\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" +#endif "movq 24(%rbx),%rcx\n\t" /* table->ArgumentTable */ "movzbl (%rcx,%rax),%ecx\n\t" "subq $0x30,%rcx\n\t" @@ -2818,6 +2877,20 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "callq *(%r10,%rax,8)\n\t" "leaq -0x98(%rbp),%rcx\n\t" __ASM_LOCAL_LABEL("__wine_syscall_dispatcher_return") ":\n\t" +#ifdef __APPLE__ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "pushq %rdi\n\t" + "pushq %rdx\n\t" + "pushq %rsi\n\t" + "movq 0xb8(%rcx),%rdi\n\t" /* frame->gsbase */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rsi\n\t" + "popq %rdx\n\t" + "popq %rdi\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" +#endif "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ "testl $0x48,%edx\n\t" /* CONTEXT_FLOATING_POINT | CONTEXT_XSTATE */ "jnz 2f\n\t" @@ -2992,6 +3065,10 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, __ASM_CFI_CFA_IS_AT2(rcx, 0x88, 0x01) "movq %rbp,0x98(%rcx)\n\t" __ASM_CFI_REG_IS_AT2(rbp, rcx, 0x98, 0x01) +#ifdef __APPLE__ + "movq %gs:0x30,%r14\n\t" + "movq %r14,0xb8(%rcx)\n\t" /* frame->gsbase */ +#endif "movdqa %xmm6,0x1c0(%rcx)\n\t" "movdqa %xmm7,0x1d0(%rcx)\n\t" "movdqa %xmm8,0x1e0(%rcx)\n\t" @@ -3029,6 +3106,18 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "mov $158,%eax\n\t" /* SYS_arch_prctl */ "syscall\n\t" "2:\n\t" +#endif +#ifdef __APPLE__ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "pushq %rdx\n\t" + "pushq %rsi\n\t" + "movq %gs:0x320,%rdi\n\t" /* amd64_thread_data()->pthread_teb */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rsi\n\t" + "popq %rdx\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" #endif "movq %r8,%rdi\n\t" /* args */ "callq *(%r10,%rdx,8)\n\t" @@ -3053,6 +3142,16 @@ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "jz 1f\n\t" "movw %gs:0x338,%fs\n" /* amd64_thread_data()->fs */ "1:\n\t" +#endif +#ifdef __APPLE__ + "pushq %rax\n\t" + "pushq %rcx\n\t" + "pushq %rdx\n\t" + "movq 0xb8(%rcx),%rdi\n\t" /* frame->gsbase */ + "call " __ASM_NAME("_thread_set_tsd_base") "\n\t" + "popq %rdx\n\t" + "popq %rcx\n\t" + "popq %rax\n\t" #endif "movq 0x60(%rcx),%r14\n\t" "movq 0x28(%rcx),%rdi\n\t"
From: Brendan Shanks bshanks@codeweavers.com
--- dlls/ntdll/unix/signal_x86_64.c | 36 --------------------------------- 1 file changed, 36 deletions(-)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index e90e2042275..34d85a7d574 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2711,12 +2711,7 @@ __ASM_GLOBAL_FUNC( signal_start_thread, * __wine_syscall_dispatcher */ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00) @@ -2758,12 +2753,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movl 0xb0(%rcx),%r14d\n\t" /* frame->syscall_flags */ "testl $3,%r14d\n\t" /* SYSCALL_HAVE_XSAVE | SYSCALL_HAVE_XSAVEC */ "jz 2f\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rdx\n\t" - "movl 0x340(%rdx),%eax\n\t" -#else "movl %gs:0x340,%eax\n\t" /* amd64_thread_data()->xstate_features_mask */ -#endif "xorl %edx,%edx\n\t" "andl $7,%eax\n\t" "xorq %rbp,%rbp\n\t" @@ -2830,12 +2820,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movl %eax,%ebx\n\t" "shrl $8,%ebx\n\t" "andl $0x30,%ebx\n\t" /* syscall table number */ -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x330(%rcx),%rcx\n\t" -#else "movq %gs:0x330,%rcx\n\t" /* amd64_thread_data()->syscall_table */ -#endif "leaq (%rcx,%rbx,2),%rbx\n\t" "andl $0xfff,%eax\n\t" /* syscall number */ "cmpq 16(%rbx),%rax\n\t" /* table->ServiceLimit */ @@ -2908,14 +2893,8 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "2:\ttestl $3,%r14d\n\t" /* SYSCALL_HAVE_XSAVE | SYSCALL_HAVE_XSAVEC */ "jz 3f\n\t" "movq %rax,%r11\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rdx\n\t" - "movl 0x340(%rdx),%eax\n\t" - "movl 0x344(%rdx),%edx\n\t" -#else "movl %gs:0x340,%eax\n\t" /* amd64_thread_data()->xstate_features_mask */ "movl %gs:0x344,%edx\n\t" /* amd64_thread_data()->xstate_features_mask high dword */ -#endif "xrstor64 0xc0(%rcx)\n\t" "movq %r11,%rax\n\t" "movl 0xb4(%rcx),%edx\n\t" /* frame->restore_flags */ @@ -2986,12 +2965,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "movq 0x10(%rcx),%rcx\n\t" "iretq\n" /* RESTORE_FLAGS_INSTRUMENTATION */ -#ifdef __APPLE__ - "2:\tmovq %gs:0x30,%r10\n\t" - "movq 0x348(%r10),%r10\n\t" -#else "2:\tmovq %gs:0x348,%r10\n\t" /* amd64_thread_data()->instrumentation_callback */ -#endif "movq (%r10),%r10\n\t" "test %r10,%r10\n\t" "jz 3b\n\t" @@ -3014,12 +2988,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_return,
__ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_instrumentation, -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00) @@ -3036,12 +3005,7 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher_instrumentation, */ __ASM_GLOBAL_FUNC( __wine_unix_call_dispatcher, "movq %rcx,%r10\n\t" -#ifdef __APPLE__ - "movq %gs:0x30,%rcx\n\t" - "movq 0x328(%rcx),%rcx\n\t" -#else "movq %gs:0x328,%rcx\n\t" /* amd64_thread_data()->syscall_frame */ -#endif "popq 0x70(%rcx)\n\t" /* frame->rip */ __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") __ASM_CFI_REG_IS_AT2(rip, rcx, 0xf0,0x00)
Hi,
It looks like your patch introduced the new failures shown below. Please investigate and fix them before resubmitting your patch. If they are not new, fixing them anyway would help a lot. Otherwise please ask for the known failures list to be updated.
The tests also ran into some preexisting test failures. If you know how to fix them that would be helpful. See the TestBot job for the details:
The full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=149820
Your paranoid android.
=== debian11b (64 bit WoW report) ===
user32: win.c:4070: Test failed: Expected active window 0000000002980178, got 0000000000000000. win.c:4071: Test failed: Expected focus window 0000000002980178, got 0000000000000000.
Now that the PE separation is complete, GSBASE can be swapped when entering/exiting PE code. This is done in the syscall dispatcher, unix-call dispatcher, and for user-mode callbacks. GSBASE also needs to be set to the macOS TSD when entering signal handlers (in `init_handler()`), and then restored to the Windows TEB when exiting (in `leave_handler()`). There is a special-case needed in `usr1_handler`: when inside a syscall (on the kernel stack), GSBASE may need to be reset to either the TEB or the TSD. The only way to tell is to determine what GSBASE was set to on entry to the signal handler.
Other signals can also happen during syscalls. I think you should use the same mechanism as Linux in `leave_handler()` instead of special-casing SIGUSR1.
On Fri Nov 22 00:43:11 2024 +0000, Alexandre Julliard wrote:
Now that the PE separation is complete, GSBASE can be swapped when
entering/exiting PE code. This is done in the syscall dispatcher, unix-call dispatcher, and for user-mode callbacks. GSBASE also needs to be set to the macOS TSD when entering signal handlers (in `init_handler()`), and then restored to the Windows TEB when exiting (in `leave_handler()`). There is a special-case needed in `usr1_handler`: when inside a syscall (on the kernel stack), GSBASE may need to be reset to either the TEB or the TSD. The only way to tell is to determine what GSBASE was set to on entry to the signal handler. Other signals can also happen during syscalls. I think you should use the same mechanism as Linux in `leave_handler()` instead of special-casing SIGUSR1.
Ah true, I can do that: ``` if (!is_inside_signal_stack( (void *)RSP_sig(sigcontext )) && !is_inside_syscall(sigcontext)) _thread_set_tsd_base( (uint64_t)NtCurrentTeb() ); ```
That doesn't cover the scenario I added the special-case for though (and which needs to be addressed for other signal handlers too):
If a signal is delivered in `__wine_syscall_dispatcher_return` after the `_thread_set_tsd_base` call (setting GSBASE back to the TEB) but before switching back to the user stack, GSBASE needs to be reset to the TEB even though `is_inside_syscall()` is still TRUE. Rather than the `gs30` trick, maybe this could be detected by putting a label at the stack switch and then checking whether `RIP` is between `__wine_syscall_dispatcher_return` or the stack switch?
Also I don't think the `inside_syscall` case of `usr1_handler()` even calls `leave_handler()`? But that could be added.
If a signal is delivered in `__wine_syscall_dispatcher_return` after the `_thread_set_tsd_base` call (setting GSBASE back to the TEB) but before switching back to the user stack, GSBASE needs to be reset to the TEB even though `is_inside_syscall()` is still TRUE. Rather than the `gs30` trick, maybe this could be detected by putting a label at the stack switch and then checking whether `RIP` is between `__wine_syscall_dispatcher_return` or the stack switch?
That's handled by making the stack pointer point inside the frame, which causes `is_inside_syscall` to return FALSE. As long as you do it at the same place as the Linux case it should work fine.
That's handled by making the stack pointer point inside the frame, which causes `is_inside_syscall` to return FALSE.
Making SP point inside the frame *before* calling `_thread_set_tsd_base` still leads to a time window where `is_inside_syscall()` is FALSE but GSBASE is still pointing to TSD.
On Fri Nov 22 15:22:32 2024 +0000, Jinoh Kang wrote:
That's handled by making the stack pointer point inside the frame,
which causes `is_inside_syscall` to return FALSE. Making SP point inside the frame *before* calling `_thread_set_tsd_base` still leads to a time window where `is_inside_syscall()` is FALSE but GSBASE is still pointing to TSD.
I'd say, don't use `is_inside_signal_stack()` at all but instead stick to `gs30` for entry/exit of all signal handlers.
On Fri Nov 22 15:24:07 2024 +0000, Jinoh Kang wrote:
I'd say, don't use `is_inside_signal_stack()` at all but instead stick to `gs30` for entry/exit of all signal handlers.
Obviously we need to make sure to not use the TSD after the stack change, but that shouldn't be an issue.
Obviously we need to make sure to not use the TSD after the stack change, but that shouldn't be an issue.
"We" include signals, which are what sparked this discussion in the first place. Unless I've missed something.
On Fri Nov 22 16:05:43 2024 +0000, Jinoh Kang wrote:
Obviously we need to make sure to not use the TSD after the stack
change, but that shouldn't be an issue. "We" include signals, which are what sparked this discussion in the first place. Unless I've missed something (do we block signals while switching stacks?)
We are already handling all of this correctly with the %fs register on Linux. I don't see why it would need to be different on macOS.
On Fri Nov 22 16:21:07 2024 +0000, Alexandre Julliard wrote:
We are already handling all of this correctly with the %fs register on Linux. I don't see why it would need to be different on macOS.
Oh, if we're *completely* getting rid of `gs30` hack and following the same pattern as wrfsbaae/movfs, then no handlers are confused and we're fine. Thanks for pointing that out. Sorry for confusion.
On Fri Nov 22 16:35:35 2024 +0000, Jinoh Kang wrote:
Oh, if we're *completely* getting rid of `gs30` hack and following the same pattern as wrfsbase/movfs, then no handlers are confused and we're fine. Thanks for pointing that out. Sorry for confusion.
Ah I see, the key is "As long as you do it at the same place as the Linux case it should work fine". Right now in `__wine_syscall_dispatcher_return` I'm resetting GSBASE much earlier than the Linux %fs is set, since the `SYSCALL_HAVE_XSAVE` case accesses %gs. Maybe the FP/xstate restore could be moved below the `%rsp > frame` line? Or `xstate_features_mask` could be stored in the syscall frame?
On Fri Nov 22 18:30:32 2024 +0000, Brendan Shanks wrote:
Ah I see, the key is "As long as you do it at the same place as the Linux case it should work fine". Right now in `__wine_syscall_dispatcher_return` I'm resetting GSBASE much earlier than the Linux %fs is set, since the `SYSCALL_HAVE_XSAVE` case accesses %gs. Maybe the FP/xstate restore could be moved below the `%rsp > frame` line? Or `xstate_features_mask` could be stored in the syscall frame?
I was able to re-order things so that `%rsp > frame` is the first instruction of `__wine_syscall_dispatcher_return`, followed by the Linux/Mac `fs`/`gs` restore.
I was still getting crashes though, from earlier in the syscall dispatcher after switching to the kernel stack but before GSBASE is set to `pthread_teb`. That accesses `%gs` to get to `pthread_teb`, and there's also an access for `syscall_table`. I think these `%gs` accesses need to be moved to before the kernel stack switch--once we switch to the kernel stack, GSBASE could be set to the TSD at any time by a signal.