[PATCH 0/3] MR11069: Fix VEH/misaligned stack related crashes with copy protection software.
These patches aim to fix crashes and hardlocks when running software that uses a exception handler as a control flow stub, which also happens to violate stack alignment. The software register a VEH which calls a procedure that destroys stack alignment and restores a pre-saved context before returning. This VEH also calls OpenThread with a misaligned stack, causing GCC/Clang's movaps to fault and recurse through the VEH until signal stack is exhausted. Additionally the software installs hw breakpoints on KSHARED_USER_DATA. Wine's signal handlers access the same addresses triggering recursive SIGTRAP dispatches which crashes via `virtual_setup_exception()`'s nested exception check. Although all 3 patches are required together for the software in question to function, the HWBKPT signal fix and the VEH clobber fix Wine issues that may affect other software. I am the least certain about the viability of `OpenThread`'s handrolled ASM version. If the assembly implementation is not acceptable applying `__attribute__((optimize("no-tree-vectorize")))` would also prevent movaps emission. OpenThread is a small function that does not meaningfully benefit from SIMD acceleration so the performance effect should be negligible. Discussed on #winehackers prior to submission. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069
From: Soham Nandy <soham.nandy2006@gmail.com> Treat nonvolatile general-purpose registers as clobbered after invoking a vectored exception handler. Some applications use VEH callbacks restore nonvolatile registers before returning. Clobbering the general purpose registers allows the compiler to preserves caller state across the call boundary. --- dlls/ntdll/exception.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/dlls/ntdll/exception.c b/dlls/ntdll/exception.c index a2b7513bfa3..edb67621f66 100644 --- a/dlls/ntdll/exception.c +++ b/dlls/ntdll/exception.c @@ -173,6 +173,13 @@ static LONG call_vectored_handlers( EXCEPTION_RECORD *rec, CONTEXT *context ) TRACE( "calling handler at %p code=%lx flags=%lx\n", func, rec->ExceptionCode, rec->ExceptionFlags ); ret = func( &except_ptrs ); + +#if defined(__WINE_PE_BUILD) && defined(__x86_64__) && !defined(__arm64ec__) + /* + * Some VEH callbacks restore nonvolatile registers before returning. + */ + __asm__ __volatile__( "" ::: "rbx", "rsi", "rdi", "rbp", "r12", "r13", "r14", "r15", "memory" ); +#endif TRACE( "handler at %p returned %lx\n", func, ret ); RtlEnterCriticalSection( &vectored_handlers_section ); -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/11069
From: Soham Nandy <soham.nandy2006@gmail.com> Some copy protection software call OpenThread with a misaligned stack. GCC emits movaps when it can prove 16-byte alignment, but given the broken stack setup rsp+0x60 is only 8-byte aligned on entry, causing a fault before NtOpenThread() is reached: movups XMMWORD PTR [rsp+0x48],xmm0 lea rcx,[rsp+0x28] add eax,eax movaps XMMWORD PTR [rsp+0x60],xmm0 <- fault Provide a handrolled asm implementation that forces alignment unconditionally via andq $~15,%rsp regardless of caller state. --- dlls/kernelbase/thread.c | 60 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/dlls/kernelbase/thread.c b/dlls/kernelbase/thread.c index 13040915817..cad7fb6a2b5 100644 --- a/dlls/kernelbase/thread.c +++ b/dlls/kernelbase/thread.c @@ -397,6 +397,65 @@ LANGID WINAPI DECLSPEC_HOTPATCH GetThreadUILanguage(void) /*********************************************************************** * OpenThread (kernelbase.@) */ +#if defined(__WINE_PE_BUILD) && defined(__x86_64__) && !defined(__arm64ec__) +/* + * Some copy protection software calls OpenThread() with a misaligned stack. + * + * GCC emits movaps when it can prove 16-byte alignment, but with the broken + * stack setup (rsp+0x60) is 8 byte aligned and causes a fault. MSVC never emits + * movaps regardless of provable alignment so this passes on Windows. + * + * Provide a handrolled asm implementation that forces alignment + * unconditionally. + */ +__ASM_GLOBAL_FUNC( OpenThread, + ".byte 0x48\n\t" /* hotpatch prolog */ + "pushq %rbp\n\t" + __ASM_SEH(".seh_pushreg %rbp\n\t") + __ASM_CFI(".cfi_adjust_cfa_offset 8\n\t") + __ASM_CFI(".cfi_rel_offset %rbp,0\n\t") + "movq %rsp,%rbp\n\t" + __ASM_SEH(".seh_setframe %rbp,0\n\t") + __ASM_CFI(".cfi_def_cfa_register %rbp\n\t") + __ASM_SEH(".seh_endprologue\n\t") + "subq $0x80,%rsp\n\t" + "andq $~15,%rsp\n\t" + "movl %ecx,%r10d\n\t" /* access */ + "movl %r8d,%eax\n\t" /* id */ + "movq $0,0x28(%rsp)\n\t" /* handle */ + "movq $0,0x30(%rsp)\n\t" /* cid.UniqueProcess */ + "movq %rax,0x38(%rsp)\n\t" /* cid.UniqueThread */ + "movl $0x30,0x40(%rsp)\n\t" /* attr.Length */ + "movq $0,0x48(%rsp)\n\t" /* attr.RootDirectory */ + "movq $0,0x50(%rsp)\n\t" /* attr.ObjectName */ + "testl %edx,%edx\n\t" + "setne %al\n\t" + "movzbl %al,%eax\n\t" + "addl %eax,%eax\n\t" + "movl %eax,0x58(%rsp)\n\t" /* attr.Attributes */ + "movq $0,0x60(%rsp)\n\t" /* attr.SecurityDescriptor */ + "movq $0,0x68(%rsp)\n\t" /* attr.SecurityQualityOfService */ + "leaq 0x28(%rsp),%rcx\n\t" + "movl %r10d,%edx\n\t" + "leaq 0x40(%rsp),%r8\n\t" + "leaq 0x30(%rsp),%r9\n\t" + "call *__imp_NtOpenThread(%rip)\n\t" + "testl %eax,%eax\n\t" + "jne 1f\n\t" + "movq 0x28(%rsp),%rax\n\t" + "jmp 2f\n\t" + "1:\tmovl %eax,%ecx\n\t" + "call *__imp_RtlNtStatusToDosError(%rip)\n\t" + "movl %eax,%edx\n\t" + "movq %gs:0x30,%rax\n\t" + "movl %edx,0x68(%rax)\n\t" + "xorl %eax,%eax\n\t" + "2:\tleaq 0(%rbp),%rsp\n\t" + __ASM_CFI(".cfi_def_cfa_register %rsp\n\t") + "popq %rbp\n\t" + __ASM_CFI(".cfi_adjust_cfa_offset -8\n\t") + "ret" ) +#else HANDLE WINAPI DECLSPEC_HOTPATCH OpenThread( DWORD access, BOOL inherit, DWORD id ) { HANDLE handle; @@ -410,6 +469,7 @@ HANDLE WINAPI DECLSPEC_HOTPATCH OpenThread( DWORD access, BOOL inherit, DWORD id if (!set_ntstatus( NtOpenThread( &handle, access, &attr, &cid ))) handle = 0; return handle; } +#endif /* callback for QueueUserAPC */ -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/11069
From: Soham Nandy <soham.nandy2006@gmail.com> Ignore hardware-breakpoint traps raised to avoid recursively dispatching a trap through the Unix signal handler when user code installs a hardware breakpoint that is also reached from Wine's signal-handling path --- dlls/ntdll/unix/signal_x86_64.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 41293fe5607..77c2cc9c6c8 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2149,9 +2149,11 @@ static BOOL handle_syscall_trap( struct thread_data *data, ucontext_t *sigcontex R10_sig( sigcontext ) = RCX_sig( sigcontext ); fixup_frame_fpu_state( frame, sigcontext ); } - else if (siginfo->si_code == 4 /* TRAP_HWBKPT */ && is_inside_syscall( data, RSP_sig(sigcontext) )) + else if (siginfo->si_code == 4 /* TRAP_HWBKPT */ && + (is_inside_syscall( data, RSP_sig(sigcontext) ) || + is_inside_signal_stack( data, (void *)RSP_sig(sigcontext) ))) { - TRACE_(seh)( "ignoring HWBKPT in syscall rip=%p\n", (void *)RIP_sig(sigcontext) ); + TRACE_(seh)( "ignoring HWBKPT in syscall/signal stack rip=%p\n", (void *)RIP_sig(sigcontext) ); return TRUE; } else return FALSE; @@ -2369,7 +2371,7 @@ static void trap_handler( int signal, siginfo_t *siginfo, void *_sigcontext ) struct thread_data *data = init_handler( sigcontext ); struct xcontext context; EXCEPTION_RECORD rec = { .ExceptionAddress = (void *)RIP_sig(sigcontext) }; - + if (handle_syscall_trap( data, sigcontext, siginfo )) return; save_context( data, &context, sigcontext ); -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/11069
I guess it would be easier if these three unrelated fixes would be split in three MRs as it looks like their "suspiciousness" and chances to get merged in their current form differ greatly. "ntdll: Ignore hardware breakpoint traps inside the signal stack." looks most straightforward, I guess it could just use spuriously added newline removal and better formatting of added condition to match the overall style around. "kernelbase: Handle misaligned stack in OpenThread": if doing it this way I guess implementing the whole function in asm is not needed, a wrapper which would force align the stack would do. But I am not sure whether such a workaround is a go not. We do have few cases of functions coded in asm to fix special cases, but realigning stack in one random function, while not necessarily impossible looks a bit weird. I've encountered similar cases in the past, for one, similar scenario hitting OpenMutexW() (which is currently worked around in Proton in rather ad-hoc but much shorter way). I'd think it could be ideal if we had a simple way to force such sort of stack realignment if needed, maybe generic asm wrapper macro in include/wine/asm.h if there is no better way. An attribute would be better but not sure offhand if good attribute for that exists. AFAIK there is no such to force the actual stack realignment, not sure about \__attribute_\_((optimize("no-tree-vectorize"))), don't seem like it is meant exactly for that? Even if affects movaps generation its meaning is not to prevent sse instruction but disable automatic loop vectorization? "ntdll: clobber nonvolatile-registers across VEH callbacks.": a more idiomatic way to do this would probably be making an asm wrapper (like we already have for call_seh_handler, for different reasons). -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069#note_142231
Thanks for the input, I agree that seperating these would probably make it easier.
more idiomatic way to do this would probably be making an asm wrapper (like we already have for call_seh_handler, for different reasons
I did initially start off like this in my testing but isn't is cleaner and more readable to just a single line instruction for the compiler?
Even if affects movaps generation its meaning is not to prevent sse instruction but disable automatic loop vectorization?
Yes, but it does also affect movaps generation for some reason, and there's no loops in OpenThread which is why I suggested it.
looks most straightforward, I guess it could just use spuriously added newline removal and better formatting of added condition to match the overall style around.
Copy that I will split up the 2 ntdll changes into seperate branches and make MRs for each individually. I'll check and reconsider the OpenThread change for now. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069#note_142232
I have split up and created a MR for the breakpoint at !11071 for now. I will create MRs for the other ones committing/trying changes as asked by @gofman -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069#note_142234
This merge request was closed by Soham Nandy. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069
I did initially start off like this in my testing but isn't is cleaner and more readable to just a single line instruction for the compiler?
In the first place, I am not sure that dropping dangling clobber like that is correct, or it is not very obvious. Not sure we can assume such an implicit dependency between C statements and inlined asm code clobbers, I think clobbers specification is meant for hinting about registers clobbered in the same asm block.
Yes, but it does also affect movaps generation for some reason, and there's no loops in OpenThread which is why I suggested it.
Maybe, for some reason, but it may be by chance and not be guaranteed to stay the same across different compilers (for one, we support clang also). Maybe it is possible to design some universal macro which will realign stack and call to specified function. Or even support that as an .spec file attribute to generate such a thunk in winegcc. My main point is that such thing pops up from time to time and adding special asm wrapper for a random function which DRM happened to call this way while it has movaps by chance might be not particularly appealing, while if that is just a matter of putting and attribute on function, entry in spec file or a one line macro call we may easily add those as needed. Perhaps we'd need some input from Alexandre on the preferred direction before going far with it. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069#note_142237
Appreciate the feedback again. for the clobber statement I shall do what you say and create an ASM wrapper... As for the OpenThread one im honestly not entirely sure what the best fix for this would even be. As it is pretty much any routine called with a misaligned stack has a chance of breaking due to GCC/Clang movaps / other instructions that MSVC does not generate at all. Even if we did have a wrapper for stack realignment it would need to be applied to every offending routine we came across by testing. From my brief discussion with a programmer way more experienced in this stuff a suggestion to patch GCC with a flag that disables `movaps` was brought up. So yeah I'm not entirely sure on what the correct approach to this would be either. Hoping for input from far more experienced people :D -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069#note_142239
As for the OpenThread one im honestly not entirely sure what the best fix for this would even be. As it is pretty much any routine called with a misaligned stack has a chance of breaking due to GCC/Clang movaps / other instructions that MSVC does not generate at all. Even if we did have a wrapper for stack realignment it would need to be applied to every offending routine we came across by testing. From my brief discussion with a programmer way more experienced in this stuff a suggestion to patch GCC with a flag that disables `movaps` was brought up.
In such cases when things can't be sanely fixed universally we are used to add similar sort of workarounds in specific places as needed. I think it is in part a matter of how much of weird code needs to be added in a specific case. I am not talking about generating stack realignment for each function in existence, it is not great and likely not what Windows does. If there is some easy way we can add that for some few specific functions once some app depends on that. Not exactly related, but are you sure MSVC doesn't ever generate movaps? Or maybe it is just less likely to use SSE at all on any occasion initializing / copying local variables? -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069#note_142247
I checked it. MSVC seems rather reluctant to use movaps unless floating-point data is involved. It will, however, happily emit movdqa, which has the same alignment requirement. https://godbolt.org/z/sG63fefcY -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069#note_142251
Yeah, I think more likely DRMs go away with that by chance on Windows, just the specific functions they called this way don't hit that (and once it is not the case it crashes on Windows and gets noticed and fixed). -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11069#note_142257
participants (4)
-
Alfred Agrell (@Alcaro) -
Paul Gofman (@gofman) -
Soham Nandy -
Soham Nandy (@natimerry)