Based on @rbernon's work in !1088, with additional ARM support and some cleanups.
-- v2: ntdll: Remove the signal_exit_thread wrapper. ntdll: Get rid of the thread exit frame on ARM. ntdll: Get rid of the thread exit frame on ARM64. ntdll: Get rid of the thread exit frame on x86-64. ntdll: Get rid of the thread exit frame on i386. ntdll: Switch to the kernel stack to abort a thread on ARM. ntdll: Switch to the kernel stack to abort a thread on ARM64. ntdll: Switch to the kernel stack to abort a thread on x86-64. ntdll: Switch to the kernel stack to abort a thread on i386. ntdll: Connect syscall frames across user callbacks on ARM. ntdll: Connect syscall frames across user callbacks on ARM64. ntdll: Connect syscall frames across user callbacks on x86-64.
This merge request has too many patches to be relayed via email. Please visit the URL below to see the contents of the merge request. https://gitlab.winehq.org/wine/wine/-/merge_requests/4445
Out of curiosity, how does pthread_exit unwind work on ARM? Does it not need any unwind info somehow?
It does need it, I've added CFI info in v2. I'll leave the EHABI support to @mstorsjo since I'm not sure how to test it.
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/signal_arm64.c:
"mov sp, x10\n\t" /* we're now on the kernel stack, stitch unwind info with previous frame */ __ASM_CFI_CFA_IS_AT2(x22, 0x98, 0x02) /* frame->syscall_cfa */
__ASM_CFI_REG_IS_AT2(sp, x22, 0x98, 0x02)
I'm not sure to understand why you removed the sp info here (vs not in the i386 / x86_64 dispatchers) and why you didn't include it in user_mode_abort_thread?
Is it because it's not been modified before, and so we assume that the default and implicit rule of `.cfi_val_offset %sp,0`, or old sp == cfa (or whatever it is) is still valid?
Feels a bit brittle if that ever changes isn't it? Especially in the ARM dispatchers where we could add intermediate user stack CFI like on x86.
Looks good otherwise.
On Tue Nov 21 19:56:19 2023 +0000, Rémi Bernon wrote:
I'm not sure to understand why you removed the sp info here (vs not in the i386 / x86_64 dispatchers) and why you didn't include it in user_mode_abort_thread? Is it because it's not been modified before, and so we assume that the default and implicit rule of `.cfi_val_offset %sp,0`, or old sp == cfa (or whatever it is) is still valid? Feels a bit brittle if that ever changes isn't it? Especially in the ARM dispatchers where we could add intermediate user stack CFI like on x86.
old_sp == cfa is the default, I don't think it's useful to specify it explicitly. TBH I'm not sure I see a reason to do it on x86 either.
On Tue Nov 21 20:39:43 2023 +0000, Alexandre Julliard wrote:
old_sp == cfa is the default, I don't think it's useful to specify it explicitly. TBH I'm not sure I see a reason to do it on x86 either.
The reason is that there are cfi rules for it before we reach the kernel stack switch, and we need to override them again or it will fail to restore sp on unwind and crash. I think it could also perhaps be done through `.cfi_restore %rsp`.
On Tue Nov 21 20:45:25 2023 +0000, Rémi Bernon wrote:
The reason is that there are cfi rules for it before we reach the kernel stack switch, and we need to override them again or it will fail to restore sp on unwind and crash. I think it could also perhaps be done through `.cfi_restore %rsp`.
Yes, but are the previous rules necessary? Why isn't switching the CFA sufficient?
On Tue Nov 21 18:21:47 2023 +0000, Alexandre Julliard wrote:
Out of curiosity, how does pthread_exit unwind work on ARM? Does it
not need any unwind info somehow? It does need it, I've added CFI info in v2. I'll leave the EHABI support to @mstorsjo since I'm not sure how to test it.
To test the ARM EHABI support, build without libunwind - most current ARM toolchains should produce EHABI unwind info. Also, alternatively, run with stripped .so files.
Since ARM uses EHABI for normal unwinding, the DWARF unwind info isn't in `.eh_frame` (which is loaded at runtime) as on other platforms, but in `.debug_frame`, which isn't loaded into memory by default, and is lost in stripped binaries. (Therefore, libunwind has to do ugly tricks to locate the corresponding files from disk, open them, locate `.debug_frame`, etc. Plus there were some bugs/regressions around this on ARM in latest libunwind in git in the last few years.)
To quicker get up to speed on what to fix here, I presume that the functions that are edited might need corresponding changes with `__ASM_EHABI()` directives. To test the change, is there anything specific to do, or is it enough to just check that processes shut down cleanly etc?
On Tue Nov 21 20:49:03 2023 +0000, Alexandre Julliard wrote:
Yes, but are the previous rules necessary? Why isn't switching the CFA sufficient?
You're right, we should drop the sp rules and it'll use the default rule everywhere.
This made me look at it a bit closer and I spotted a couple of places where I made some mistakes: Some user frame CFI left overs, and an invalid user CFI offset, and another set of issues where the kernel CFI is briefly invalid in the return path, whenever the register used as a CFA base is restored early before switching back to the user stack.
I've made some changes, also tried to apply them to the ARM code, and pushed them to https://gitlab.winehq.org/wine/wine/-/merge_requests/1088.
To quicker get up to speed on what to fix here, I presume that the functions that are edited might need corresponding changes with `__ASM_EHABI()` directives. To test the change, is there anything specific to do, or is it enough to just check that processes shut down cleanly etc?
Yes, make sure that threads exit cleanly, particularly when they receive a SIGQUIT from the server.
I haven't reviewed this in detail, but superficially it seems to be the way I would have addressed the bug, and it seems to fix the demo program I had written.
On Wed Nov 22 13:00:08 2023 +0000, Alexandre Julliard wrote:
To quicker get up to speed on what to fix here, I presume that the
functions that are edited might need corresponding changes with `__ASM_EHABI()` directives. To test the change, is there anything specific to do, or is it enough to just check that processes shut down cleanly etc? Yes, make sure that threads exit cleanly, particularly when they receive a SIGQUIT from the server.
What I'm doing to check this is:
1) Comment out `if (InterlockedDecrement( &nb_threads ) <= 0) abort_process( status );` in `abort_thread`. 2) Call `pthread_exit( UIntToPtr(status) );` before closing the fds in `pthread_exit_wrapper`. 3) Add `if (sigaction( SIGUSR2, &sig_act, NULL ) == -1) goto error;` next to `SIGQUIT` handler. 4) Run wine in Gdb (could use !1074 but that's not required if you don't care having symbols). 5) Break anywhere you want to test, call `queue-signal SIGUSR2` then `continue`.
To make sure it also works with cleanup handlers and condition variables, or if running Gdb is an issue, I also have a local change somewhere on the unix side that gets eventually called which blocks forever waiting for a USR2:
```c static void cleanup(void *arg) { fprintf(stderr, "*** cleanup called\n"); }
/* ... */ { pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
pthread_cleanup_push(cleanup, 0); pthread_cleanup_push(cleanup, 0); pthread_mutex_lock(&mutex); pthread_cond_wait(&cond, &mutex); pthread_mutex_unlock(&mutex); pthread_cleanup_pop(0); pthread_cleanup_pop(0); } ```
On Wed Nov 22 13:11:09 2023 +0000, Rémi Bernon wrote:
What I'm doing to check this is:
- Comment out `if (InterlockedDecrement( &nb_threads ) <= 0)
abort_process( status );` in `abort_thread`. 2) Call `pthread_exit( UIntToPtr(status) );` before closing the fds in `pthread_exit_wrapper`. 3) Add `if (sigaction( SIGUSR2, &sig_act, NULL ) == -1) goto error;` next to `SIGQUIT` handler. 4) Run wine in Gdb (could use !1074 but that's not required if you don't care having symbols). 5) Break anywhere you want to test, call `queue-signal SIGUSR2` then `continue`. To make sure it also works with cleanup handlers and condition variables, or if running Gdb is an issue, I also have a local change somewhere on the unix side that gets eventually called which blocks forever waiting for a USR2:
static void cleanup(void *arg) { fprintf(stderr, "*** cleanup called\n"); } /* ... */ { pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t cond = PTHREAD_COND_INITIALIZER; pthread_cleanup_push(cleanup, 0); pthread_cleanup_push(cleanup, 0); pthread_mutex_lock(&mutex); pthread_cond_wait(&cond, &mutex); pthread_mutex_unlock(&mutex); pthread_cleanup_pop(0); pthread_cleanup_pop(0); }
Adding a trace on top of `segv_handler` is also useful to catch any issue, as it otherwise sometimes fire segfaults but gets killed right away and may stay unnoticed.
Superseded by the updated !1088.
This merge request was closed by Alexandre Julliard.