Hi,
I've tried out building and running wine on macOS on arm64, and gotten it working for small test executables at least.
Overall it seems to work, but getting it to work requires dealing with three main issues - some of them that can make it hard to use this with any unmodified windows executable:
- The page size on darwin on ARM64 is 16 KB, not 4 KB. For some bits, e.g. ntdll/unix, this is possible to handle, but e.g. if one were to build ntdll as a PE, this PE would need to know that it's supposed to run in a different environment. And this one makes it impossible to run normal executables, unless they've been linked with e.g. -Wl,--section-alignment,0x4000.
- Setting -pagezero_size to anything less than 4 GB seems to make macOS refuse to run the executable. So this makes it impossible to map anything into the lower 4 GB of the address space. For now, I've worked it around by moving the address at which user_shared_data is allocated.
- Memory mappings can't be writable and executable at the same time. If one mmap()s a page and request it to be both writable and executable, writing to it fails, same if changing protection with mprotect(). This requires changes in a few places, to avoid needlessly(?) allocating things as executable, or map buffers first in readwrite mode, while changing to read-execute after filling them.
And a potential fourth one that I haven't really dealt with yet:
- Darwin also treats x18 as reserved, just like windows, but IIRC the system can spuriously(?) overwrite the register to zero at some times. I haven't run into this in the context of wine on macOS yet though.
Regardless of these issues, I'll start off by sending the patches that are the most clean, that should fix compilation for this target at least.
Martin Storsjo (4): winebuild: Use the right arm64 page/pageoff relocation syntax for darwin ntdll: Trust libunwind's returned pc value on arm64 ntdll: Fix the arm64 use of libunwind for macOS ntdll: Implement arm64 sigcontext access for macOS
dlls/ntdll/unix/signal_arm64.c | 75 +++++++++++++++++++++++++++++++++- tools/winebuild/build.h | 2 + tools/winebuild/import.c | 34 +++++++-------- tools/winebuild/spec32.c | 4 +- tools/winebuild/utils.c | 32 +++++++++++++++ 5 files changed, 128 insertions(+), 19 deletions(-)
Signed-off-by: Martin Storsjo martin@martin.st --- tools/winebuild/build.h | 2 ++ tools/winebuild/import.c | 34 ++++++++++++++++++---------------- tools/winebuild/spec32.c | 4 ++-- tools/winebuild/utils.c | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 54 insertions(+), 18 deletions(-)
diff --git a/tools/winebuild/build.h b/tools/winebuild/build.h index e1d2e5edf8..03715af956 100644 --- a/tools/winebuild/build.h +++ b/tools/winebuild/build.h @@ -291,6 +291,8 @@ extern const char *get_asm_export_section(void); extern const char *get_asm_rodata_section(void); extern const char *get_asm_rsrc_section(void); extern const char *get_asm_string_section(void); +extern const char *arm64_page( const char *sym ); +extern const char *arm64_pageoff( const char *sym ); extern void output_function_size( const char *name ); extern void output_gnu_stack_note(void);
diff --git a/tools/winebuild/import.c b/tools/winebuild/import.c index da9ad62022..751335f36f 100644 --- a/tools/winebuild/import.c +++ b/tools/winebuild/import.c @@ -761,8 +761,8 @@ static void output_import_thunk( const char *name, const char *table, int pos ) output( "1:\t.long %s+%u-(1b+4)\n", table, pos ); break; case CPU_ARM64: - output( "\tadrp x9, %s\n", table ); - output( "\tadd x9, x9, #:lo12:%s\n", table ); + output( "\tadrp x9, %s\n", arm64_page( table ) ); + output( "\tadd x9, x9, #%s\n", arm64_pageoff( table ) ); if (pos & 0xf000) output( "\tadd x9, x9, #%u\n", pos & 0xf000 ); if (pos & 0x0f00) output( "\tadd x9, x9, #%u\n", pos & 0x0f00 ); if (pos & 0x00f0) output( "\tadd x9, x9, #%u\n", pos & 0x00f0 ); @@ -1080,8 +1080,8 @@ static void output_delayed_import_thunks( const DLLSPEC *spec ) case CPU_ARM64: output( "\tstp x29, x30, [sp,#-16]!\n" ); output( "\tmov x29, sp\n" ); - output( "\tadrp x9, %s\n", asm_name("__wine_spec_delay_load") ); - output( "\tadd x9, x9, #:lo12:%s\n", asm_name("__wine_spec_delay_load") ); + output( "\tadrp x9, %s\n", arm64_page( asm_name("__wine_spec_delay_load") ) ); + output( "\tadd x9, x9, #%s\n", arm64_pageoff( asm_name("__wine_spec_delay_load") ) ); output( "\tblr x9\n" ); output( "\tmov x9, x0\n" ); output( "\tldp x29, x30, [sp],#16\n" ); @@ -1367,17 +1367,19 @@ void output_stubs( DLLSPEC *spec ) else output( "\t.long %u\n", odp->ordinal ); break; case CPU_ARM64: - output( "\tadrp x0, .L__wine_spec_file_name\n" ); - output( "\tadd x0, x0, #:lo12:.L__wine_spec_file_name\n" ); + output( "\tadrp x0, %s\n", arm64_page(".L__wine_spec_file_name") ); + output( "\tadd x0, x0, #%s\n", arm64_pageoff(".L__wine_spec_file_name") ); if (exp_name) { - output( "\tadrp x1, .L%s_string\n", name ); - output( "\tadd x1, x1, #:lo12:.L%s_string\n", name ); + char *sym = strmake( ".L%s_string", name ); + output( "\tadrp x1, %s\n", arm64_page( sym ) ); + output( "\tadd x1, x1, #%s\n", arm64_pageoff( sym ) ); + free( sym ); } else output( "\tmov x1, %u\n", odp->ordinal ); - output( "\tadrp x2, %s\n", asm_name("__wine_spec_unimplemented_stub") ); - output( "\tadd x2, x2, #:lo12:%s\n", asm_name("__wine_spec_unimplemented_stub") ); + output( "\tadrp x2, %s\n", arm64_page( asm_name("__wine_spec_unimplemented_stub") ) ); + output( "\tadd x2, x2, #%s\n", arm64_pageoff( asm_name("__wine_spec_unimplemented_stub") ) ); output( "\tblr x2\n" ); break; default: @@ -1627,8 +1629,8 @@ void output_syscalls( DLLSPEC *spec ) output( "\tldr x20, [x19]\n" ); /* prev frame */ output( "\tstr x20, [sp, #88]\n" ); output( "\tstr x29, [x19]\n" ); /* syscall frame */ - output( "\tadrp x16, .Lsyscall_args\n" ); - output( "\tadd x16, x16, #:lo12:.Lsyscall_args\n" ); + output( "\tadrp x16, %s\n", arm64_page(".Lsyscall_args") ); + output( "\tadd x16, x16, #%s\n", arm64_pageoff(".Lsyscall_args") ); output( "\tldrb w9, [x16, x8]\n" ); output( "\tsubs x9, x9, #64\n" ); output( "\tbls 2f\n" ); @@ -1640,8 +1642,8 @@ void output_syscalls( DLLSPEC *spec ) output( "\tldr x10, [x11, x9]\n" ); output( "\tstr x10, [sp, x9]\n" ); output( "\tcbnz x9, 1b\n" ); - output( "2:\tadrp x16, .Lsyscall_table\n" ); - output( "\tadd x16, x16, #:lo12:.Lsyscall_table\n" ); + output( "2:\tadrp x16, %s\n", arm64_page(".Lsyscall_table") ); + output( "\tadd x16, x16, #%s\n", arm64_pageoff(".Lsyscall_table") ); output( "\tldr x16, [x16, x8, lsl 3]\n" ); output( "\tblr x16\n" ); output( "\tmov sp, x29\n" ); @@ -1741,8 +1743,8 @@ void output_syscalls( DLLSPEC *spec ) case CPU_ARM64: output( "\tstp x29, x30, [sp,#-16]!\n" ); output( "\tmov x8, #%u\n", i ); - output( "\tadrp x16, %s\n", asm_name("__wine_syscall_dispatcher") ); - output( "\tldr x16, [x16, #:lo12:%s]\n", asm_name("__wine_syscall_dispatcher") ); + output( "\tadrp x16, %s\n", arm64_page( asm_name("__wine_syscall_dispatcher") ) ); + output( "\tldr x16, [x16, #%s]\n", arm64_pageoff( asm_name("__wine_syscall_dispatcher") ) ); output( "\tblr x16\n"); output( "\tldp x29, x30, [sp], #16\n" ); output( "\tret\n" ); diff --git a/tools/winebuild/spec32.c b/tools/winebuild/spec32.c index c85249b2a9..efb136a8e6 100644 --- a/tools/winebuild/spec32.c +++ b/tools/winebuild/spec32.c @@ -339,8 +339,8 @@ static void output_relay_debug( DLLSPEC *spec ) output( "\tstp x8, x9, [SP,#-16]!\n" ); output( "\tmov w1, #%u\n", odp->u.func.args_str_offset << 16 ); if (i - spec->base) output( "\tadd w1, w1, #%u\n", i - spec->base ); - output( "\tadrp x0, .L__wine_spec_relay_descr\n"); - output( "\tadd x0, x0, #:lo12:.L__wine_spec_relay_descr\n"); + output( "\tadrp x0, %s\n", arm64_page(".L__wine_spec_relay_descr") ); + output( "\tadd x0, x0, #%s\n", arm64_pageoff(".L__wine_spec_relay_descr") ); output( "\tldr x3, [x0, #8]\n"); output( "\tblr x3\n"); output( "\tadd SP, SP, #16\n" ); diff --git a/tools/winebuild/utils.c b/tools/winebuild/utils.c index 07ef2ed298..1af42f1000 100644 --- a/tools/winebuild/utils.c +++ b/tools/winebuild/utils.c @@ -1291,3 +1291,35 @@ const char *get_asm_string_section(void) default: return ".section .rodata"; } } + +const char *arm64_page( const char *sym ) +{ + static char *buffer; + + switch (target_platform) + { + case PLATFORM_APPLE: + free( buffer ); + buffer = strmake( "%s@PAGE", sym ); + return buffer; + default: + return sym; + } +} + +const char *arm64_pageoff( const char *sym ) +{ + static char *buffer; + + free( buffer ); + switch (target_platform) + { + case PLATFORM_APPLE: + buffer = strmake( "%s@PAGEOFF", sym ); + break; + default: + buffer = strmake( ":lo12:%s", sym ); + break; + } + return buffer; +}
Use this instead of manually copying LR to PC. With GNU libunwind, both registers are equal after unw_step.
With the LLVM libunwind (which Apple uses), the return address isn't reflected at all in LR, only in UNW_REG_IP.
Signed-off-by: Martin Storsjo martin@martin.st --- dlls/ntdll/unix/signal_arm64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dlls/ntdll/unix/signal_arm64.c b/dlls/ntdll/unix/signal_arm64.c index d5ced2172a..b4fe46f25a 100644 --- a/dlls/ntdll/unix/signal_arm64.c +++ b/dlls/ntdll/unix/signal_arm64.c @@ -230,7 +230,7 @@ NTSTATUS CDECL unwind_builtin_dll( ULONG type, DISPATCHER_CONTEXT *dispatch, CON unw_get_reg( &cursor, UNW_AARCH64_X29, (unw_word_t *)&context->u.s.Fp ); unw_get_reg( &cursor, UNW_AARCH64_X30, (unw_word_t *)&context->u.s.Lr ); unw_get_reg( &cursor, UNW_AARCH64_SP, (unw_word_t *)&context->Sp ); - context->Pc = context->u.s.Lr; + unw_get_reg( &cursor, UNW_REG_IP, (unw_word_t *)&context->Pc ); context->ContextFlags |= CONTEXT_UNWOUND_TO_CALL;
TRACE( "next function pc=%016lx%s\n", context->Pc, rc ? "" : " (last frame)" );
MacOS uses the LLVM libunwind, which doesn't expose quite as much internals of the cursor as GNU libunwind, and uses slightly different names for the arch specific register enums (for get/set registers).
This matches similar apple specific ifdefs in unix/signal_x86_64.c.
Signed-off-by: Martin Storsjo martin@martin.st --- dlls/ntdll/unix/signal_arm64.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/dlls/ntdll/unix/signal_arm64.c b/dlls/ntdll/unix/signal_arm64.c index b4fe46f25a..cc34690f96 100644 --- a/dlls/ntdll/unix/signal_arm64.c +++ b/dlls/ntdll/unix/signal_arm64.c @@ -156,11 +156,27 @@ NTSTATUS CDECL unwind_builtin_dll( ULONG type, DISPATCHER_CONTEXT *dispatch, CON unw_proc_info_t info; int rc;
+#ifdef __APPLE__ + rc = unw_getcontext( &unw_context ); + if (rc == UNW_ESUCCESS) + rc = unw_init_local( &cursor, &unw_context ); + if (rc == UNW_ESUCCESS) + { + int i; + for (i = 0; i <= 28; i++) + unw_set_reg( &cursor, UNW_ARM64_X0 + i, context->u.X[i] ); + unw_set_reg( &cursor, UNW_ARM64_FP, context->u.s.Fp ); + unw_set_reg( &cursor, UNW_ARM64_LR, context->u.s.Lr ); + unw_set_reg( &cursor, UNW_ARM64_SP, context->Sp ); + unw_set_reg( &cursor, UNW_REG_IP, context->Pc ); + } +#else memcpy( unw_context.uc_mcontext.regs, context->u.X, sizeof(context->u.X) ); unw_context.uc_mcontext.sp = context->Sp; unw_context.uc_mcontext.pc = context->Pc;
rc = unw_init_local( &cursor, &unw_context ); +#endif if (rc != UNW_ESUCCESS) { WARN( "setup failed: %d\n", rc ); @@ -198,6 +214,16 @@ NTSTATUS CDECL unwind_builtin_dll( ULONG type, DISPATCHER_CONTEXT *dispatch, CON dispatch->LanguageHandler = (void *)info.handler; dispatch->HandlerData = (void *)info.lsda; dispatch->EstablisherFrame = context->Sp; +#ifdef __APPLE__ + { + int i; + for (i = 0; i <= 28; i++) + unw_get_reg( &cursor, UNW_ARM64_X0 + i, (unw_word_t *)&context->u.X[i] ); + } + unw_get_reg( &cursor, UNW_ARM64_FP, (unw_word_t *)&context->u.s.Fp ); + unw_get_reg( &cursor, UNW_ARM64_X30, (unw_word_t *)&context->u.s.Lr ); + unw_get_reg( &cursor, UNW_ARM64_SP, (unw_word_t *)&context->Sp ); +#elif defined(linux) unw_get_reg( &cursor, UNW_AARCH64_X0, (unw_word_t *)&context->u.s.X0 ); unw_get_reg( &cursor, UNW_AARCH64_X1, (unw_word_t *)&context->u.s.X1 ); unw_get_reg( &cursor, UNW_AARCH64_X2, (unw_word_t *)&context->u.s.X2 ); @@ -230,6 +256,7 @@ NTSTATUS CDECL unwind_builtin_dll( ULONG type, DISPATCHER_CONTEXT *dispatch, CON unw_get_reg( &cursor, UNW_AARCH64_X29, (unw_word_t *)&context->u.s.Fp ); unw_get_reg( &cursor, UNW_AARCH64_X30, (unw_word_t *)&context->u.s.Lr ); unw_get_reg( &cursor, UNW_AARCH64_SP, (unw_word_t *)&context->Sp ); +#endif unw_get_reg( &cursor, UNW_REG_IP, (unw_word_t *)&context->Pc ); context->ContextFlags |= CONTEXT_UNWOUND_TO_CALL;
Signed-off-by: Martin Storsjo martin@martin.st --- dlls/ntdll/unix/signal_arm64.c | 46 ++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)
diff --git a/dlls/ntdll/unix/signal_arm64.c b/dlls/ntdll/unix/signal_arm64.c index cc34690f96..f1c6cdc5fa 100644 --- a/dlls/ntdll/unix/signal_arm64.c +++ b/dlls/ntdll/unix/signal_arm64.c @@ -110,6 +110,20 @@ static DWORD64 get_fault_esr( ucontext_t *sigcontext ) return 0; }
+#elif defined(__APPLE__) + +/* Special Registers access */ +# define SP_sig(context) ((context)->uc_mcontext->__ss.__sp) /* Stack pointer */ +# define PC_sig(context) ((context)->uc_mcontext->__ss.__pc) /* Program counter */ +# define PSTATE_sig(context) ((context)->uc_mcontext->__ss.__cpsr) /* Current State Register */ +# define FP_sig(context) ((context)->uc_mcontext->__ss.__fp) /* Frame pointer */ +# define LR_sig(context) ((context)->uc_mcontext->__ss.__lr) /* Link Register */ + +static DWORD64 get_fault_esr( ucontext_t *sigcontext ) +{ + return sigcontext->uc_mcontext->__es.__esr; +} + #endif /* linux */
static pthread_key_t teb_key; @@ -299,7 +313,11 @@ static void save_context( CONTEXT *context, const ucontext_t *sigcontext ) context->Sp = SP_sig(sigcontext); /* Stack pointer */ context->Pc = PC_sig(sigcontext); /* Program Counter */ context->Cpsr = PSTATE_sig(sigcontext); /* Current State Register */ +#ifdef linux for (i = 0; i <= 28; i++) context->u.X[i] = REGn_sig( i, sigcontext ); +#elif defined(__APPLE__) + for (i = 0; i <= 28; i++) context->u.X[i] = sigcontext->uc_mcontext->__ss.__x[i]; +#endif }
@@ -317,7 +335,11 @@ static void restore_context( const CONTEXT *context, ucontext_t *sigcontext ) SP_sig(sigcontext) = context->Sp; /* Stack pointer */ PC_sig(sigcontext) = context->Pc; /* Program Counter */ PSTATE_sig(sigcontext) = context->Cpsr; /* Current State Register */ +#ifdef linux for (i = 0; i <= 28; i++) REGn_sig( i, sigcontext ) = context->u.X[i]; +#elif defined(__APPLE__) + for (i = 0; i <= 28; i++) sigcontext->uc_mcontext->__ss.__x[i] = context->u.X[i]; +#endif }
@@ -328,6 +350,7 @@ static void restore_context( const CONTEXT *context, ucontext_t *sigcontext ) */ static void save_fpu( CONTEXT *context, ucontext_t *sigcontext ) { +#ifdef linux struct fpsimd_context *fp = get_fpsimd_context( sigcontext );
if (!fp) return; @@ -335,6 +358,12 @@ static void save_fpu( CONTEXT *context, ucontext_t *sigcontext ) context->Fpcr = fp->fpcr; context->Fpsr = fp->fpsr; memcpy( context->V, fp->vregs, sizeof(context->V) ); +#elif defined(__APPLE__) + context->ContextFlags |= CONTEXT_FLOATING_POINT; + context->Fpcr = sigcontext->uc_mcontext->__ns.__fpcr; + context->Fpsr = sigcontext->uc_mcontext->__ns.__fpsr; + memcpy( context->V, sigcontext->uc_mcontext->__ns.__v, sizeof(context->V) ); +#endif }
@@ -345,12 +374,18 @@ static void save_fpu( CONTEXT *context, ucontext_t *sigcontext ) */ static void restore_fpu( CONTEXT *context, ucontext_t *sigcontext ) { +#ifdef linux struct fpsimd_context *fp = get_fpsimd_context( sigcontext );
if (!fp) return; fp->fpcr = context->Fpcr; fp->fpsr = context->Fpsr; memcpy( fp->vregs, context->V, sizeof(fp->vregs) ); +#elif defined(__APPLE__) + sigcontext->uc_mcontext->__ns.__fpcr = context->Fpcr; + sigcontext->uc_mcontext->__ns.__fpsr = context->Fpsr; + memcpy( sigcontext->uc_mcontext->__ns.__v, context->V, sizeof(context->V) ); +#endif }
@@ -594,6 +629,7 @@ static void setup_exception( ucontext_t *sigcontext, EXCEPTION_RECORD *rec ) stack->rec = *rec; stack->context = context;
+#ifdef linux REGn_sig(3, sigcontext) = SP_sig(sigcontext); /* original stack pointer, fourth arg for raise_func_trampoline */ SP_sig(sigcontext) = (ULONG_PTR)stack; LR_sig(sigcontext) = PC_sig(sigcontext); @@ -602,6 +638,16 @@ static void setup_exception( ucontext_t *sigcontext, EXCEPTION_RECORD *rec ) REGn_sig(1, sigcontext) = (ULONG_PTR)&stack->context; /* second arg for KiUserExceptionDispatcher */ REGn_sig(2, sigcontext) = (ULONG_PTR)pKiUserExceptionDispatcher; /* dispatcher arg for raise_func_trampoline */ REGn_sig(18, sigcontext) = (ULONG_PTR)NtCurrentTeb(); +#elif defined(__APPLE__) + sigcontext->uc_mcontext->__ss.__x[3] = sigcontext->uc_mcontext->__ss.__sp; /* original stack pointer, fourth arg for raise_func_trampoline */ + sigcontext->uc_mcontext->__ss.__sp = (ULONG_PTR)stack; + sigcontext->uc_mcontext->__ss.__lr = sigcontext->uc_mcontext->__ss.__pc; + sigcontext->uc_mcontext->__ss.__pc = (ULONG_PTR)raise_func_trampoline; + sigcontext->uc_mcontext->__ss.__x[0] = (ULONG_PTR)&stack->rec; /* first arg for KiUserExceptionDispatcher */ + sigcontext->uc_mcontext->__ss.__x[1] = (ULONG_PTR)&stack->context; /* second arg for KiUserExceptionDispatcher */ + sigcontext->uc_mcontext->__ss.__x[2] = (ULONG_PTR)pKiUserExceptionDispatcher; /* dispatcher arg for raise_func_trampoline */ + sigcontext->uc_mcontext->__ss.__x[18] = (ULONG_PTR)NtCurrentTeb(); +#endif }
August 14, 2020 6:55 AM, "Martin Storsjo" martin@martin.st wrote:
diff --git a/dlls/ntdll/unix/signal_arm64.c b/dlls/ntdll/unix/signal_arm64.c index cc34690f96..f1c6cdc5fa 100644 --- a/dlls/ntdll/unix/signal_arm64.c +++ b/dlls/ntdll/unix/signal_arm64.c @@ -299,7 +313,11 @@ static void save_context( CONTEXT *context, const ucontext_t *sigcontext ) context->Sp = SP_sig(sigcontext); /* Stack pointer */ context->Pc = PC_sig(sigcontext); /* Program Counter */ context->Cpsr = PSTATE_sig(sigcontext); /* Current State Register */ +#ifdef linux for (i = 0; i <= 28; i++) context->u.X[i] = REGn_sig( i, sigcontext ); +#elif defined(__APPLE__)
- for (i = 0; i <= 28; i++) context->u.X[i] = sigcontext->uc_mcontext->__ss.__x[i];
Or, you could define REGn_sig() for macOS. Then you wouldn't need a lot of these #ifdefs.
Chip
On Fri, 14 Aug 2020, Chip Davis wrote:
August 14, 2020 6:55 AM, "Martin Storsjo" martin@martin.st wrote:
diff --git a/dlls/ntdll/unix/signal_arm64.c b/dlls/ntdll/unix/signal_arm64.c index cc34690f96..f1c6cdc5fa 100644 --- a/dlls/ntdll/unix/signal_arm64.c +++ b/dlls/ntdll/unix/signal_arm64.c @@ -299,7 +313,11 @@ static void save_context( CONTEXT *context, const ucontext_t *sigcontext ) context->Sp = SP_sig(sigcontext); /* Stack pointer */ context->Pc = PC_sig(sigcontext); /* Program Counter */ context->Cpsr = PSTATE_sig(sigcontext); /* Current State Register */ +#ifdef linux for (i = 0; i <= 28; i++) context->u.X[i] = REGn_sig( i, sigcontext ); +#elif defined(__APPLE__)
- for (i = 0; i <= 28; i++) context->u.X[i] = sigcontext->uc_mcontext->__ss.__x[i];
Or, you could define REGn_sig() for macOS. Then you wouldn't need a lot of these #ifdefs.
Thanks, yes, that does indeed help reduce this patch quite a bit.
// Martin
Am 14.08.20 um 13:54 schrieb Martin Storsjo:
Signed-off-by: Martin Storsjo martin@martin.st
dlls/ntdll/unix/signal_arm64.c | 46 ++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)
diff --git a/dlls/ntdll/unix/signal_arm64.c b/dlls/ntdll/unix/signal_arm64.c index cc34690f96..f1c6cdc5fa 100644 --- a/dlls/ntdll/unix/signal_arm64.c +++ b/dlls/ntdll/unix/signal_arm64.c @@ -110,6 +110,20 @@ static DWORD64 get_fault_esr( ucontext_t *sigcontext ) return 0; }
+#elif defined(__APPLE__)
+/* Special Registers access */ +# define SP_sig(context) ((context)->uc_mcontext->__ss.__sp) /* Stack pointer */ +# define PC_sig(context) ((context)->uc_mcontext->__ss.__pc) /* Program counter */ +# define PSTATE_sig(context) ((context)->uc_mcontext->__ss.__cpsr) /* Current State Register */ +# define FP_sig(context) ((context)->uc_mcontext->__ss.__fp) /* Frame pointer */ +# define LR_sig(context) ((context)->uc_mcontext->__ss.__lr) /* Link Register */
+static DWORD64 get_fault_esr( ucontext_t *sigcontext ) +{
- return sigcontext->uc_mcontext->__es.__esr;
+}
#endif /* linux */
static pthread_key_t teb_key; @@ -299,7 +313,11 @@ static void save_context( CONTEXT *context, const ucontext_t *sigcontext ) context->Sp = SP_sig(sigcontext); /* Stack pointer */ context->Pc = PC_sig(sigcontext); /* Program Counter */ context->Cpsr = PSTATE_sig(sigcontext); /* Current State Register */ +#ifdef linux for (i = 0; i <= 28; i++) context->u.X[i] = REGn_sig( i, sigcontext ); +#elif defined(__APPLE__)
- for (i = 0; i <= 28; i++) context->u.X[i] = sigcontext->uc_mcontext->__ss.__x[i];
+#endif }
Hi Martin!
maybe I miss something, but isn't it possible to write an Apple version of the REGn_sig() macro?
On Fri, 14 Aug 2020, André Hentschel wrote:
Am 14.08.20 um 13:54 schrieb Martin Storsjo:
Signed-off-by: Martin Storsjo martin@martin.st
dlls/ntdll/unix/signal_arm64.c | 46 ++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)
diff --git a/dlls/ntdll/unix/signal_arm64.c b/dlls/ntdll/unix/signal_arm64.c index cc34690f96..f1c6cdc5fa 100644 --- a/dlls/ntdll/unix/signal_arm64.c +++ b/dlls/ntdll/unix/signal_arm64.c @@ -110,6 +110,20 @@ static DWORD64 get_fault_esr( ucontext_t *sigcontext ) return 0; }
+#elif defined(__APPLE__)
+/* Special Registers access */ +# define SP_sig(context) ((context)->uc_mcontext->__ss.__sp) /* Stack pointer */ +# define PC_sig(context) ((context)->uc_mcontext->__ss.__pc) /* Program counter */ +# define PSTATE_sig(context) ((context)->uc_mcontext->__ss.__cpsr) /* Current State Register */ +# define FP_sig(context) ((context)->uc_mcontext->__ss.__fp) /* Frame pointer */ +# define LR_sig(context) ((context)->uc_mcontext->__ss.__lr) /* Link Register */
+static DWORD64 get_fault_esr( ucontext_t *sigcontext ) +{
- return sigcontext->uc_mcontext->__es.__esr;
+}
#endif /* linux */
static pthread_key_t teb_key; @@ -299,7 +313,11 @@ static void save_context( CONTEXT *context, const ucontext_t *sigcontext ) context->Sp = SP_sig(sigcontext); /* Stack pointer */ context->Pc = PC_sig(sigcontext); /* Program Counter */ context->Cpsr = PSTATE_sig(sigcontext); /* Current State Register */ +#ifdef linux for (i = 0; i <= 28; i++) context->u.X[i] = REGn_sig( i, sigcontext ); +#elif defined(__APPLE__)
- for (i = 0; i <= 28; i++) context->u.X[i] = sigcontext->uc_mcontext->__ss.__x[i];
+#endif }
Hi Martin!
maybe I miss something, but isn't it possible to write an Apple version of the REGn_sig() macro?
Doh, yes, you're right, that does simplify the patch quite a bit. Thanks!
// Martin
- Setting -pagezero_size to anything less than 4 GB seems to make
macOS refuse to run the executable. So this makes it impossible to map anything into the lower 4 GB of the address space. For now, I've worked it around by moving the address at which user_shared_data is allocated.
Shouldn't you be able to forcefully remap parts in the zero page? To my knowledge the only thing you can't do is unmap pages that the dynamic loader thinks are allocated - if it gets them again when loading a different .dylib it will crash and burn. But you can map stuff there with MAP_FIXED yourself as long as you can be sure you don't overwrite anything important - which shouldn't be the case in the zero page.
- Memory mappings can't be writable and executable at the same time.
If one mmap()s a page and request it to be both writable and executable, writing to it fails, same if changing protection with mprotect().
In Catalina you can bypass that with the right protected runtime entitlements. That means you'll have to sign your executable with a manifest that enables the protected runtime and requests the entitlements. Do you know if this still works in Big Sur x86_64 and/or big sur ARM?
- Darwin also treats x18 as reserved, just like windows, but IIRC
the system can spuriously(?) overwrite the register to zero at some times. I haven't run into this in the context of wine on macOS yet though.
Yeah someone warned me that this happens on iOS. The Darwin kernel code should be accessible I think. Any way to find out under which conditions this happens?
Hi,
On Sat, 15 Aug 2020, Stefan Dösinger wrote:
- Setting -pagezero_size to anything less than 4 GB seems to make macOS refuse to run the executable. So this makes it impossible to map anything into the lower 4 GB of the address space. For now, I've worked it around by moving the address at which user_shared_data is allocated.
Shouldn't you be able to forcefully remap parts in the zero page? To my knowledge the only thing you can't do is unmap pages that the dynamic loader thinks are allocated - if it gets them again when loading a different .dylib it will crash and burn. But you can map stuff there with MAP_FIXED yourself as long as you can be sure you don't overwrite anything important - which shouldn't be the case in the zero page.
Hmm, I'll have to experiment with this and see how it behaves.
- Memory mappings can't be writable and executable at the same time. If one mmap()s a page and request it to be both writable and executable, writing to it fails, same if changing protection with mprotect().
In Catalina you can bypass that with the right protected runtime entitlements. That means you'll have to sign your executable with a manifest that enables the protected runtime and requests the entitlements. Do you know if this still works in Big Sur x86_64 and/or big sur ARM?
Oh, good point. I'll have to look into this as well.
For the cases I've worked around regarding this so far, the solutions doesn't look too horrible, but I'm not sure if there's other cases that it breaks.
- Darwin also treats x18 as reserved, just like windows, but IIRC
the system can spuriously(?) overwrite the register to zero at some times. I haven't run into this in the context of wine on macOS yet though.
Yeah someone warned me that this happens on iOS. The Darwin kernel code should be accessible I think. Any way to find out under which conditions this happens?
Someone linked me this:
https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24a...
So apparently it would be forcibly overwritten on context switch, just to make sure it's not used.
I wrote a quick test program that sets x18 and reads it back after a few sleep(1), and it does seem like this doesn't happen any longer in macOS 11.0 though, so maybe this one isn't an issue for now.
// Martin
On Sat, 15 Aug 2020, Stefan Dösinger wrote:
- Setting -pagezero_size to anything less than 4 GB seems to make macOS refuse to run the executable. So this makes it impossible to map anything into the lower 4 GB of the address space. For now, I've worked it around by moving the address at which user_shared_data is allocated.
Shouldn't you be able to forcefully remap parts in the zero page? To my knowledge the only thing you can't do is unmap pages that the dynamic loader thinks are allocated - if it gets them again when loading a different .dylib it will crash and burn. But you can map stuff there with MAP_FIXED yourself as long as you can be sure you don't overwrite anything important - which shouldn't be the case in the zero page.
I tried this out with this small test snippet on Catalina:
#include <sys/mman.h> #include <fcntl.h> #include <stdio.h>
int main(int argc, char* argv[]) { void *target = (void*)0x7ffe0000; char *ptr = mmap(target, 4*4096, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE | MAP_FIXED, 0, 0); if (ptr == MAP_FAILED) { perror("mmap"); return 1; } printf("mmap fixed at %p returned %p\n", target, ptr); return 0; }
If linked without pagezero_size, I get "mmap: Cannot allocate memory", while it succeeds if linked with that option.
- Memory mappings can't be writable and executable at the same time. If one mmap()s a page and request it to be both writable and executable, writing to it fails, same if changing protection with mprotect().
In Catalina you can bypass that with the right protected runtime entitlements. That means you'll have to sign your executable with a manifest that enables the protected runtime and requests the entitlements. Do you know if this still works in Big Sur x86_64 and/or big sur ARM?
When trying this out on Catalina on x86_64, such mappings work just fine normally. If I enabled the hardened runtime by signing it with "codesign -o runtime", the mmap calls that request writable+executable memory fail with EPERM. If I add entitlements to the signing, either com.apple.security.cs.allow-unsigned-executable-memory or com.apple.security.cs.disable-executable-page-protection it succeeds again.
On Big Sur on arm64, the mmap calls don't fail but do return a pointer to some memory, but the supposedly writable+executable memory fails if written to. The same happens there, if I opt in to the hardened runtime, the mmap call fails, but if I add those entitlements, I get back to the original behaviour - mmap succeeds, but the memory actually isn't writable.
// Martin