(Or: "There's a limit to how many constraints you can add to a problem before it really is impossible.")
Hello all,
I'm once again going to bring up the topic of optimizing synchronization primitives. This is partly questions for Alexandre, partly questions for those people interested in the futex-based backend ("fsync"), partly an attempt to give a better and more accessible summary than I gave at WineConf last year, and partly a call for ideas, because I'm fast running out of them.
I'd like to first try to spell out the constraints I'm working with, make sure I have some of them right, and request clarification on others, because a misunderstanding of those constraints has bitten me before. I've divided them into three broad categories: correctness, security/robustness, and performance.
------------ Correctness: ------------
1. All basic sync operations work as documented and are atomic.
2. An operation need not be perfectly serialized so long as the results of that operation interleaved with any other are indistinguishable from serialized execution in either order.
3. If a thread is waiting on an event, PulseEvent() should always wake up that thread. This constraint is documented to be not always true on Windows [1], but whether said race condition ever happens in practice is not clear. It certainly must be rare at least. Both the "esync" and "fsync" patch sets violate this constraint, and in practice many applications are affected. Most use timeSetEvent(), presumably so that some procedure may run regularly every N ms, and so missing a wakeup on rare occasion would likely not cause severe damage.
4. If a thread waits repeatedly on a manual-reset event, PulseEvent() on that event should not wake up the thread more than once. Both "esync" and "fsync" currently violate this constraint.
5. WaitForSingleObject() should act as a fence. In particular, if a thread executes the following code in a loop:
WaitForSingleObject(event, INFINITE); ResetEvent(event);
then a single call to SetEvent() must only allow one iteration of the loop. I mention this constraint because a certain optimization to the "esync" patch set violated it, and caused a game (RiME) to misbehave. I have not verified that this constraint in particular caused the misbehaviour.
6. Any operation that signals an object *must* wake up a sleeping thread if that thread can be woken up. For example, if thread A is waiting on event E and thread B executes SetEvent(E) followed by ResetEvent(E), thread A must wake up. Both "esync" and "fsync" currently violate this constraint. At least one game (I think it was Bioshock) used SetEvent() followed by ResetEvent() in similar fashion to PulseEvent(), and behaved poorly under "esync".
7. [clarification needed] Similarly to the above, if thread A is waiting on all of events E and F, F is signaled, and thread B executes SetEvent(E) followed by ResetEvent(F), thread A must wake up. Both "esync" and "fsync" currently violate this constraint. From what I understand of the Windows kernel (mostly derived from [2]), I'm not sure Windows actually satisfies this constraint (it would be very difficult, if at all possible, to do this without one single lock around all sync operations, which recent versions of Windows don't do.) I haven't yet encountered any games that break due to this with "esync" or "fsync", but it may also be very hard to trigger. Is this a constraint we have to follow?
---------------------- Security / robustness: ----------------------
I would really appreciate a detailed description of what our security-related constraints are, because they're not obvious to me at all. I previously was told that shared memory was absolutely forbidden and therefore tried to work within that constraint, and was surprised to learn at WineConf that this was not the case. The obvious constraint to me is that a user should be prevented from making anyone else's life worse, but is free to make their own life worse; however, I think previous writing by Alexandre implies more strict constraints than this, and I don't see an obvious Schelling fence between these two extremes.)
8. An object belonging to a given process belonging to a given POSIX user can only be manipulated (i.e. have its state changed or wake up the sleeping process) by processes belonging to other POSIX users if the Win32 ACL permits it. This constraint is roughly meaningless as long as Wine doesn't support multiple users, but any solution we ultimately come up with should account for it.
9. [clarification requested] In particular, it also must not be possible to change the state of an object to an invalid state (e.g. a semaphore whose count is too high), or to a state that would not be possible with pure Win32 functions (e.g. change the owner of a mutex to a process owned by another user).
10. [clarification needed] As constraint 8, but in regard to processes belonging to the same user. Is this a constraint we have to follow? If so, why? It will always be possible for a sufficiently determined process to alter the state, including to invalid states, if by no other means than using ptrace or injecting code.
11. [clarification needed] As constraint 9, but in regard to processes belonging to the same POSIX user. Is this a constraint we have to follow?
12. [clarification requested] It should not be possible for a process to "accidentally" disrupt the functionality of another process if such disruption would not be possible purely through NT functions of "known" objects. This is a lesser version (a proper subset) of constraint 8 or 10. This is vague, and hence clarification is appreciated. But for example, supposing that the process never uses host syscalls, Win32 debugger functions, or duplicates a handle outside of normal functionality, then any crash, memory corruption, or accidental misuse of Win32 functions should not affect processes which have never shared a handle, or processes which share a handle but do not use it, etc. This constraint is mentioned largely because it significantly restricts how we can use shared memory.
13. It should not be possible for a process to "accidentally" disrupt the functionality of the wineserver, i.e. it would take intentional use of host functions (interrupts, etc.) to do so.
------------ Performance: ------------
This is a bit weird, because, as far as Wine's concerned, any solution is good. But as far as I'm concerned, if it's not at least as good as the solutions we have, I still need to maintain the solutions we have. Maintaining those trees outside of Wine takes up more time than I really want it to.
14. SetEvent(), ResetEvent(), ReleaseMutex(), ReleaseSemaphore(), and wait-any for mutexes, semaphores, and events should all be fast. PulseEvent(), NtQuery*(), wait-all, and waiting on other objects (threads, processes, named pipes, ...) can be slow. I'm not sure where to put timers. "slow" here means at least as slow as they currently are. "fast" means at least as fast as they are with "fsync", which is currently the fastest solution.
15. Cross-process objects must be fast. Pierre-Loup A. Griffais informed me during my presentation last WineConf that processes are increasingly relying on this.
16. We should have a solution for MacOS that's at least as good as the "esync"-based pipe backend. (This patch set is currently used in some builds of CrossOver. It's not part of Staging or in a particularly accessible place—though of course it's distributed as part of CrossOver's source—and I apologize for that. I can provide it independently on request; thus far it hasn't been a high priority).
========= SOLUTIONS =========
* I won't go into detail on esync. Its function and its limitations have been discussed before and are documented along with the patch set. fsync mostly works the same way, but on top of futexes instead of eventfds.
* Broadly speaking, do scheduling in user space, using shared memory to satisfy constraint 15 (cross-process performance). This allows us to satisfy every correctness constraint. However, robustness is a lot harder to satisfy. For example:
- Any such scheduler would almost certainly have to take locks around a wait list. However, that means the server would also have to take locks to abandon mutices. If a process crashes with a lock held, the server hangs, violating constraint 13.
- Any process has access to any other process's wait list and can "accidentally" break it quite badly, probably violating constraint 12.
- An object's state would presumably be in shared memory. We can't selectively prevent *which* values can be written, so this violates constraint 11.
Additionally, Pierre-Loup A. Griffais has expressed doubts that such an approach can be as performant as "fsync", because (to oversimplify?) the overhead of a user-mode scheduler is combined with the overhead of kernel-mode scheduling.
* Extend an existing kernel-level interface (probably eventfd) or introduce a new set of kernel-level synchronization primitives which will essentially look exactly like the current wineserver interface. This allows us to satisfy every correctness *and* robustness constraint. In theory it will be exactly as performant as Windows. It is more performant than "esync" in some places, equally performant in others, and worse in still others (actually, in one specific place—waiting for an already signaled manual-reset event with esync makes no system calls.) It is not as performant as "fsync", and it is almost certain that the difference is something that people will still care about, which means that I'd still have to maintain fsync forever. Also, this won't help for Mac, so I'd have to maintain esync at least until Apple decides to break Wine in a way we can't fix. (I don't think it'll take very long.)
* That's... kind of it. I'm out of ideas. I'd appreciate any others.
ἔρρωσθε, Zeb
[1] https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-pulsee... [2] https://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Farewell-to-the-Windo...