I'm not suggesting to use signals to call the APC function,
No, and I didn't understand that either. I don't even think that's architecturally possible—we know how the NT API works; it's not a callback.
just that what we need here is a simple "APC pending" flag. I understand that it makes sense to map this to an fd in your design, it just seems rather heavyweight to have to allocate and manage yet another fd for every thread just for this.
That's what I did propose in the initial revision, though, if I understand you correctly.
One problem is that we can get a signal at any time, either before or after the call or during it, via USR1. If we get a signal we *must* not let the wait consume any objects. That means either:
* we need a handle to represent the thread's signaled state - the current approach. It's worth mentioning that while fd's aren't free, even the kernel seemed to find them cheap enough that they recommended using them instead of opaque handles (like an earlier revision did).
* the kernel needs to manage the thread's signaled state. This requires modifying task_struct (there is no generic TLS mechanism in the kernel) which I think we can safely assert is a complete non-starter;
* or we need to synchronize on the user side by blocking signals. This requires extra syscalls per wait, which was one of your concerns (the other essentially being complexity). I'm currently inclined to agree with those concerns, and I anticipate finding it difficult to convince Greg that the old API is worse *and* that we really need an API change.
Am I missing something? Even if it's not USR1 per se, I don't see a way to avoid the problem that *someone* needs to maintain the signaled state, which has to be us, and we need a way to synchronize that signaled state with USR1 so that a single wait can't both consume an object *and* report STATUS_USER_APC.