Unlike the message queue for which I believe we can get rid of any special handling, I don't see a way to avoid a dedicated inproc sync for user APCs. This will make sure that the sync gets created on thread creation. It could arguably be lazily allocated instead, but I don't see a reason to delay the failure to a later time and end up with successfully created but unalertable thread when they are supposed to be.
-- v2: server: Create an inproc sync for user APC signaling.
From: Elizabeth Figura zfigura@codeweavers.com
--- server/thread.c | 11 +++++++++++ server/thread.h | 1 + 2 files changed, 12 insertions(+)
diff --git a/server/thread.c b/server/thread.c index fba1f69d284..853aff4cc84 100644 --- a/server/thread.c +++ b/server/thread.c @@ -397,6 +397,7 @@ static inline void init_thread_structure( struct thread *thread ) int i;
thread->sync = NULL; + thread->alert_sync = NULL; thread->unix_pid = -1; /* not known yet */ thread->unix_tid = -1; /* not known yet */ thread->context = NULL; @@ -560,6 +561,7 @@ struct thread *create_thread( int fd, struct process *process, const struct secu } if (!(thread->request_fd = create_anonymous_fd( &thread_fd_ops, fd, &thread->obj, 0 ))) goto error; if (!(thread->sync = create_event_sync( 1, 0 ))) goto error; + if (get_inproc_device_fd() >= 0 && !(thread->alert_sync = create_inproc_internal_sync( 1, 0 ))) goto error;
if (process->desktop) { @@ -654,6 +656,7 @@ static void destroy_thread( struct object *obj ) release_object( thread->process ); if (thread->id) free_ptid( thread->id ); if (thread->token) release_object( thread->token ); + if (thread->alert_sync) release_object( thread->alert_sync ); if (thread->sync) release_object( thread->sync ); }
@@ -1440,7 +1443,11 @@ static int queue_apc( struct process *process, struct thread *thread, struct thr grab_object( apc ); list_add_tail( queue, &apc->entry ); if (!list_prev( queue, &apc->entry )) /* first one */ + { + if (apc->call.type == APC_USER && thread->alert_sync) + signal_inproc_sync( thread->alert_sync ); wake_thread( thread ); + }
return 1; } @@ -1472,6 +1479,8 @@ void thread_cancel_apc( struct thread *thread, struct object *owner, enum apc_ty apc->executed = 1; signal_sync( apc->sync ); release_object( apc ); + if (list_empty( &thread->user_apc ) && thread->alert_sync) + reset_inproc_sync( thread->alert_sync ); return; } } @@ -1486,6 +1495,8 @@ static struct thread_apc *thread_dequeue_apc( struct thread *thread, int system { apc = LIST_ENTRY( ptr, struct thread_apc, entry ); list_remove( ptr ); + if (list_empty( &thread->user_apc ) && thread->alert_sync) + reset_inproc_sync( thread->alert_sync ); } return apc; } diff --git a/server/thread.h b/server/thread.h index 1418e4c5039..9c552a88ed2 100644 --- a/server/thread.h +++ b/server/thread.h @@ -51,6 +51,7 @@ struct thread { struct object obj; /* object header */ struct event_sync *sync; /* sync object for wait/signal */ + struct inproc_sync *alert_sync; /* inproc sync for user apc alerts */ struct list entry; /* entry in system-wide thread list */ struct list proc_entry; /* entry in per-process thread list */ struct list desktop_entry; /* entry in per-desktop thread list */
v2: Leave the fd sending aside for now, it'll be retrieved lazily as alertable waits are made. Not sure to see how it is better though.
I'm not suggesting to use signals to call the APC function,
No, and I didn't understand that either. I don't even think that's architecturally possible—we know how the NT API works; it's not a callback.
just that what we need here is a simple "APC pending" flag. I understand that it makes sense to map this to an fd in your design, it just seems rather heavyweight to have to allocate and manage yet another fd for every thread just for this.
That's what I did propose in the initial revision, though, if I understand you correctly.
One problem is that we can get a signal at any time, either before or after the call or during it, via USR1. If we get a signal we *must* not let the wait consume any objects. That means either:
* we need a handle to represent the thread's signaled state - the current approach. It's worth mentioning that while fd's aren't free, even the kernel seemed to find them cheap enough that they recommended using them instead of opaque handles (like an earlier revision did).
* the kernel needs to manage the thread's signaled state. This requires modifying task_struct (there is no generic TLS mechanism in the kernel) which I think we can safely assert is a complete non-starter;
* or we need to synchronize on the user side by blocking signals. This requires extra syscalls per wait, which was one of your concerns (the other essentially being complexity). I'm currently inclined to agree with those concerns, and I anticipate finding it difficult to convince Greg that the old API is worse *and* that we really need an API change.
Am I missing something? Even if it's not USR1 per se, I don't see a way to avoid the problem that *someone* needs to maintain the signaled state, which has to be us, and we need a way to synchronize that signaled state with USR1 so that a single wait can't both consume an object *and* report STATUS_USER_APC.
On Tue Sep 30 20:07:07 2025 +0000, Rémi Bernon wrote:
v2: Leave the fd sending aside for now, it'll be retrieved lazily as alertable waits are made. Not sure to see how it is better though.
It doesn't have to be done lazily (but it can be of course). The point is that there's no reason to do it in the very first request, particularly since that differs between the first and subsequent threads. Just add another request to retrieve it somewhere during thread init.
On Tue Sep 30 20:07:07 2025 +0000, Alexandre Julliard wrote:
It doesn't have to be done lazily (but it can be of course). The point is that there's no reason to do it in the very first request, particularly since that differs between the first and subsequent threads. Just add another request to retrieve it somewhere during thread init.
The problem with retrieving an fd during thread init, and the reason I moved it to create_thread request, is that it triggers a deadlock with some kernel32 test terminating a thread quickly after having created it. I had opened https://gitlab.winehq.org/wine/wine/-/merge_requests/8878 to fix it, which I can reopen, but the discussions there seemed to indicate that it was unlikely to be accepted.
On Wed Oct 1 06:11:42 2025 +0000, Rémi Bernon wrote:
The problem with retrieving an fd during thread init, and the reason I moved it to create_thread request, is that it triggers a deadlock with some kernel32 test terminating a thread quickly after having created it. I had opened https://gitlab.winehq.org/wine/wine/-/merge_requests/8878 to fix it, which I can reopen, but the discussions there seemed to indicate that it was unlikely to be accepted.
I don't think that this should constrain the implementation. We'll want to address it one way or another, but for now it would be OK to comment out the offending test.
On Wed Oct 1 07:42:32 2025 +0000, Alexandre Julliard wrote:
I don't think that this should constrain the implementation. We'll want to address it one way or another, but for now it would be OK to comment out the offending test.
I don't really see how splitting the fd sending over two different requests was so bad, or rather I think the problem comes from having two different requests for first / other threads init in the first place and that's unrelated to the thing we're doing here. It should / could be solved later (or before), but given how difficult it has been to iterate over wineserver design changes for ntsync, my current feeling is that I will unlikely be trying to myself.
About the proposed change, I don't think adding an extra request just to retrieve this ntsync specific fd on every thread init is better. I don't think having a request dedicated to retrieving that fd is nice in the first place, but I don't mind that much, and if we are going to do that then it's probably better to just delay the request until the fd is actually needed. I suppose it might end up failing if the client has too many open fds at that point, instead of failing thread creation, but maybe it's not a problem we should be caring for.
In any case, I can keep upstreaming the message queue changes, but I think that you should then take over the remaining changes (which would then be ~10 patches, after `server: Create event syncs for the message queues.` in https://gitlab.winehq.org/rbernon/wine/-/commits/wip/ntsync) after that because you seem to have specific ideas about how you would like things to be implemented here and the slow iterative process has been quite impractical and painful for everyone involved.