https://bugs.winehq.org/show_bug.cgi?id=53682
Bug ID: 53682 Summary: wineboot shows "user_check_not_lock BUG: holding USER lock" on aarch64 since wine-7.14 Product: Wine Version: 7.14 Hardware: aarch64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: ntdll Assignee: wine-bugs@winehq.org Reporter: PuetzKevinA@JohnDeere.com Distribution: ---
I recently updated our builds to past wine-7.0, and began encountering a wineboot failure on aarch64
0040:err:system:user_check_not_lock BUG: holding USER lock(64) s\system32\rundll32.exe: /home/yukondev/workspace/wine/dlls/win32u/sysparams.c:400: user_check_not_lock: Assertion `0' failed. 0040:err:seh:call_function_handlers invalid frame 21f410 (0000000000022000-0000000000120000) 0040:err:seh:NtRaiseException Exception frame is not in stack limits => unable to dispatch exception.
After some bisecting, I narrowed this down to being a regression between 7.13 and 7.14, and then further narrowed it to https://source.winehq.org/git/wine.git/commit/d50112b4b6e82782d3924a8dbd443f...
Somehow the call to KeUserModeCallback( NtUserLoadSysMenu,... ) at https://gitlab.winehq.org/wine/wine/-/blob/d50112b4b6e82782d3924a8dbd443f82f... does not properly return.
I think that commit is mostly a false lead though, the translation of syscall NtUserCreateWindowEx to a syscall just exposed a latent bug in KeUserModeCallback/__wine_syscall_dispatcher on aarch64, since having NtUserCreateWindowEx be a syscall means KeUserModeCallback can no longer use its "if we have no syscall frame, call the callback directly" simple path https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/ntdll/unix/signal_arm...
What seems to be actually at fault is that, inside of User32LoadSysMenu (the actual function invoked by KeUserModeCallback), is a call to NtUserCreateMenu - which is *also* a syscall. This call doesn't crash, and in fact all of User32LoadSysMenu runs to completion. But when it goes through __wine_syscall_dispatcher, the stack pointer is restored to be `$sp = arm64_thread_data()->syscall_frame` https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/ntdll/unix/signal_arm... - i.e. it points to &callback_frame, back on the stack of KeUserModeCallback. And this is *not* the bottom of the stack; there's compiler-generated prologue/epilogue to restore various non-volatile registers.
0x0000ffffaa0247b4 <KeUserModeCallback+0>: sub sp, sp, #0x450 0x0000ffffaa0247b8 <KeUserModeCallback+4>: stp x29, x30, [sp] 0x0000ffffaa0247bc <KeUserModeCallback+8>: mov x29, sp 0x0000ffffaa0247c0 <KeUserModeCallback+12>: stp x19, x20, [sp, #16] 0x0000ffffaa0247c4 <KeUserModeCallback+16>: stp x21, x22, [sp, #32] 0x0000ffffaa0247c8 <KeUserModeCallback+20>: str w0, [sp, #56] 0x0000ffffaa0247cc <KeUserModeCallback+24>: str x1, [sp, #48] 0x0000ffffaa0247d0 <KeUserModeCallback+28>: str w2, [sp, #60] 0x0000ffffaa0247d4 <KeUserModeCallback+32>: mov x21, x3 0x0000ffffaa0247d8 <KeUserModeCallback+36>: mov x20, x4 0x0000ffffaa0247dc <KeUserModeCallback+40>: add x19, sp, #0x40 // x19=&callback_frame
So any code that runs inside this syscall that uses the first 0x40 bytes of stack is trampling these variables in the frame of KeUserModeCallback. Eventually User32LoadSysMenu returns back into KiUserCallbackDispatcher, which passes it into NtCallbackReturn, which does a __wine_longjmp back into the KeUserModeCallback,and we exit from the __wine_setjmp for the second time (returning 0) and get to `return callback_frame.status` https://gitlab.winehq.org/wine/wine/-/blob/wine-7.17/dlls/ntdll/unix/signal_.... But then the epilogue starts peeling off the stack
=> 0x0000ffffaa024864 <+176>: ldr w0, [sp, #1088] 0x0000ffffaa024868 <+180>: ldp x19, x20, [sp, #16] 0x0000ffffaa02486c <+184>: ldp x21, x22, [sp, #32] 0x0000ffffaa024870 <+188>: ldp x29, x30, [sp] 0x0000ffffaa024874 <+192>: add sp, sp, #0x450 0x0000ffffaa024878 <+196>: ret
and the link register $x30 = (void *) 0xffffaa024984 <NtCallbackReturn+104>, rather than 0xffffa8f0ccdc <copy_sys_popup+44> as it was when it was pushed in the prologue.
It's overwritten several times along the way, but I don't think any of these call sites are at fault; they are just writing to what they think is their own stack frame, unaware that __wine_syscall_dispatcher adjusted $sp to too-high a value and they are overwriting space that belongs to KeUserModeCallback.
The specific places that overwrote this entry on the stack were #0 0x0000ffffa8f37a48 in NtUserCallOneParam (arg=0, code=2) at /home/yukondev/workspace/wine/dlls/win32u/sysparams.c:5357 #1 0x0000ffffaa022cf0 in __wine_syscall_dispatcher () from /opt/wine/bin/../lib/wine/aarch64-unix/ntdll.so which set it to 0xffffaa022cf0 <__wine_syscall_dispatcher+272>
#0 insert_menu_item (ret_pos=0x21f538, flags=1024, id=4294967295, handle=0x10042) at /home/yukondev/workspace/wine/dlls/win32u/menu.c:438 #1 NtUserThunkedMenuItemInfo (handle=0x10042, pos=4294967295, flags=1024, method=1, info=0x11f528 <opengl_func_names+680>, str=<optimized out>) at /home/yukondev/workspace/wine/dlls/win32u/menu.c:1297 #2 0x0000ffffaa022cf0 in __wine_syscall_dispatcher () from /opt/wine/bin/../lib/wine/aarch64-unix/ntdll.so
which set it to NULL
and 0x0000ffffaa023a0c in NtCurrentTeb () at /home/yukondev/workspace/wine/dlls/ntdll/unix/signal_arm64.c:1449 1449 { (gdb) bt #0 0x0000ffffaa023a0c in NtCurrentTeb () at /home/yukondev/workspace/wine/dlls/ntdll/unix/signal_arm64.c:1449 #1 0x0000ffffaa02493c in ntdll_get_thread_data () at /home/yukondev/workspace/wine/dlls/ntdll/unix/unix_private.h:70 #2 arm64_thread_data () at /home/yukondev/workspace/wine/dlls/ntdll/unix/signal_arm64.c:163 #3 NtCallbackReturn (ret_ptr=0x0, ret_len=0, status=65602) at /home/yukondev/workspace/wine/dlls/ntdll/unix/signal_arm64.c:784 #4 0x0000ffffaa022cf0 in __wine_syscall_dispatcher () from /opt/wine/bin/../lib/wine/aarch64-unix/ntdll.so
which set it to 0xffffaa02493c <NtCallbackReturn+32>, and then 0xffffaa024978 <NtCallbackReturn+92> , then 0xffffaa024984 <NtCallbackReturn+104>
The same sort of thing happens in the x86_64 dispatcher, but there it turns out to be pretty harmless. The key difference is that the function prologue on x86_64 used `push` instructions, and did so prior to the `sub` where it allocated space for locals, so the things popped by the epilogue ended up above callback_frame, rather than below it, and so are not smashed.
0x00007fd17b6dc1d2 <+0>: push %rbp 0x00007fd17b6dc1d3 <+1>: mov %rsp,%rbp 0x00007fd17b6dc1d6 <+4>: push %r14 0x00007fd17b6dc1d8 <+6>: push %r13 0x00007fd17b6dc1da <+8>: push %r12 0x00007fd17b6dc1dc <+10>: push %rdi 0x00007fd17b6dc1dd <+11>: push %rsi 0x00007fd17b6dc1de <+12>: push %rbx 0x00007fd17b6dc1df <+13>: sub $0xa0,%rsp 0x00007fd17b6dc1e6 <+20>: and $0xffffffffffffffc0,%rsp 0x00007fd17b6dc1ea <+24>: sub $0x560,%rsp
There is still a 32-byte gap between the $rsp of KeUserModeCallback and ¤t_frame that is briefly at risk, but 1. This seems to be the register parameter area of the Windows x64 ABI, which is actually volatile space that belongs to the callee even though it's allocated by the caller, so the compiler is not expecting it to survive across function calls (in this case __wine_syscall_dispatcher_return). https://docs.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170 discusses how this "contains at least 4 entries", i.e. 32 bytes: https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64 2. It doesn't actually get smashed, because __wine_syscall_dispatcher subtracts 0x20 from the restored $rsp before actually making the syscall (https://gitlab.winehq.org/wine/wine/-/blob/d50112b4b6e82782d3924a8dbd443f82f...). Which is presumably since the SysV ABI does *not* make any such reservation, and so __wine_syscall_dispatcher needs to do so to be following the ms_abi.
So together, these mean that on x86_64, it would be OK if __wine_syscall_dispatcher used these 32 bytes between the correct $rsp of KeUserModeCallback and frame->syscall_table (though it doesn't seem to). And the eventual callee gets a $rsp that does *not* overlap with KeUserModeCallback (even though it would be legal for it to do so). I don't know that we have any actual guarantee that callback_frame will be at the bottom, but in practice it seems to be.
But on aarch64, there's important stuff below callback_frame, so this doesn't work out.