-- v4: ntdll: Raise SIGQUIT instead of aborting immediately if the server pipe is broken.
From: Zebediah Figura zfigura@codeweavers.com
Keil UV4 C51 starts a thread and then immediately terminates it, usually while it's in the middle of an NtClose(). This results in the thread dying while holding the fd_cache_handle mutex, which results in a hang from other threads in the process.
We don't need to process SIGQUIT immediately, so just mask it off until we're out of the uninterrupted section.
This will require the next commit to fix the bug.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=54807 --- dlls/ntdll/unix/server.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/dlls/ntdll/unix/server.c b/dlls/ntdll/unix/server.c index 53fcc61ccf3..057efb5e917 100644 --- a/dlls/ntdll/unix/server.c +++ b/dlls/ntdll/unix/server.c @@ -1579,6 +1579,7 @@ size_t server_init_process(void) sigaddset( &server_block_set, SIGIO ); sigaddset( &server_block_set, SIGINT ); sigaddset( &server_block_set, SIGHUP ); + sigaddset( &server_block_set, SIGQUIT ); sigaddset( &server_block_set, SIGUSR1 ); sigaddset( &server_block_set, SIGUSR2 ); sigaddset( &server_block_set, SIGCHLD );
From: Zebediah Figura zfigura@codeweavers.com
Keil UV4 C51 starts a thread and then immediately terminates it, usually while it's in the middle of an NtClose().
The previous commit prevents the SIGQUIT from killing the thread before it exits the fd_cache_section mutex. However, the server will still have closed its side of the socket, resulting in the close_handle server request receiving EPIPE from write(), or zero from read(), at which point the client calls abort_thread() anyway, while still holding the mutex.
Raise SIGQUIT instead, which will terminate the thread once it leaves the uninterrupted section, after releasing any relevant mutexes.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=54807 --- dlls/ntdll/unix/server.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/dlls/ntdll/unix/server.c b/dlls/ntdll/unix/server.c index 057efb5e917..df4ead87115 100644 --- a/dlls/ntdll/unix/server.c +++ b/dlls/ntdll/unix/server.c @@ -206,7 +206,14 @@ static unsigned int send_request( const struct __server_request_info *req ) }
if (ret >= 0) server_protocol_error( "partial write %d\n", ret ); - if (errno == EPIPE) abort_thread(0); + if (errno == EPIPE) + { + /* SIGQUIT might be blocked. We want to terminate ourselves, but only + * after signals have been unblocked (and e.g. mutexes have been + * released). */ + raise( SIGQUIT ); + return STATUS_THREAD_IS_TERMINATING; + } if (errno == EFAULT) return STATUS_ACCESS_VIOLATION; server_protocol_perror( "write" ); } @@ -217,7 +224,7 @@ static unsigned int send_request( const struct __server_request_info *req ) * * Read data from the reply buffer; helper for wait_reply. */ -static void read_reply_data( void *buffer, size_t size ) +static unsigned int read_reply_data( void *buffer, size_t size ) { int ret;
@@ -225,7 +232,7 @@ static void read_reply_data( void *buffer, size_t size ) { if ((ret = read( ntdll_get_thread_data()->reply_fd, buffer, size )) > 0) { - if (!(size -= ret)) return; + if (!(size -= ret)) return STATUS_SUCCESS; buffer = (char *)buffer + ret; continue; } @@ -234,8 +241,11 @@ static void read_reply_data( void *buffer, size_t size ) if (errno == EPIPE) break; server_protocol_perror("read"); } - /* the server closed the connection; time to die... */ - abort_thread(0); + /* SIGQUIT might be blocked. We want to terminate ourselves, but only + * after signals have been unblocked (and e.g. mutexes have been + * released). */ + raise( SIGQUIT ); + return STATUS_THREAD_IS_TERMINATING; }
@@ -246,9 +256,11 @@ static void read_reply_data( void *buffer, size_t size ) */ static inline unsigned int wait_reply( struct __server_request_info *req ) { - read_reply_data( &req->u.reply, sizeof(req->u.reply) ); - if (req->u.reply.reply_header.reply_size) - read_reply_data( req->reply_data, req->u.reply.reply_header.reply_size ); + unsigned int ret; + ret = read_reply_data( &req->u.reply, sizeof(req->u.reply) ); + if (!ret && req->u.reply.reply_header.reply_size) + ret = read_reply_data( req->reply_data, req->u.reply.reply_header.reply_size ); + if (ret) return ret; return req->u.reply.reply_header.error; }
I think calling `raise(SIGQUIT);` just before returning `STATUS_THREAD_IS_TERMINATING` will fix all that, while still achieving your purpose. If the server has already sent SIGQUIT, this has virtually no effect; otherwise, the signal will be delievered once we unblock it later.
That probably makes sense, though.
Done in v3.
This merge request was approved by Jinoh Kang.
On Wed Jan 17 15:40:53 2024 +0000, Zebediah Figura wrote:
My problem is processes that *continue* to do something externally
visible without assistance from the server. A nontrivial portion of syscalls doesn't actually require the server to work; for example, NtWriteFile() can work with fd cache alone as long as it doesn't need to send completion to the server. I don't think this is a meaningful concern?
Other syscalls that *do* contact server will just return
`STATUS_THREAD_IS_TERMINATING`, which has ample possibility to confuse the app which won't voluntarily terminate by itself and will continue doing something weird externally.
I think calling `raise(SIGQUIT);` just before returning
`STATUS_THREAD_IS_TERMINATING` will fix all that, while still achieving your purpose. If the server has already sent SIGQUIT, this has virtually no effect; otherwise, the signal will be delievered once we unblock it later. That probably makes sense, though.
`receive_fd` is called while `fd_cache_mutex` is held and has
`abort_thread(0);` too; maybe it needs similar treatment? It does, but it doesn't affect this application. I don't know if should be added to the series for code freeze in that case.
We're past the code freeze now. Do you think this is worth revisiting?