Hello,
After putting this project down for a little while I've finally picked it back up and have it working and passing its tests. This is still a proof-of-concept and needs a lot more cleanup before it's ready for staging or any serious review. I would be most interested to discover additional programs whos performance improves with this patch set (the Star Wars Battlefronts I and II by Pandemic were the original impetus of this). You must start your program with STAGING_SHARED_MEMORY=1 STAGING_SHM_SYNC=1 to enable it and I have hosted the patches on my github to anyone interested in trying it out:
git clone https://github.com/daniel-santos/wine git checkout hybrid-sync
Although somewhat complex, this implements a fast synchronization framework that:
* Can support semaphores, mutexes and events (only semaphores implemented right now) * Is scaleable, * Can share objects with other processes, * Can perform most operations without a server call, and * Provides reasonably strong shared memory integrity guarantees with well-defined behaviour when shared memory is corrupted or altered incorrectly.
I have called them "Hybrid" objects because they have a private and shared memory portion. When another process opens one, the shared memory portion is migrated to a new memory region shared only by the processes that have access to the object. This migration works even when multiple threads and processes are performing operations on the object. Scaleability is achieved through the use of memfd-backed shared memory slab caches on the server, from which all shared memory is allocated.
Since it uses memfd and futexes it's only supported on Linux. Initial support is for x86 systems, but ARM, PPC, can be added without too much effort. I have not yet closely examined a BSD implementation.
How It Works
Every struct process in the server can have one or more struct process_group objects associated with it. These represent unique sets of processes that are sharing objects. When a client requests a new synchronization object, the server looks for a process_group that only contains the calling process and creates one if none exists -- this also creates a new slab cache, which allocates a block of shared memory.
When another process later requests to open the object, the server finds or creates a process_group with exactly the two processes in it and then migrates the shared portion of the object to the new shared memory slab. The migration is transparent to the client -- if a process is waiting (natively) on the object it awakens its threads when the migration is done so that one of them can request the new shared memory data, map it into memory and continue its operation. Similarly, when a process closes such an object, it is again migrated back to a process_group that only contains the remaining processes who have access to it (or destroys it).
Sometimes traditional server-side synchronization objects are needed (e.g., for a critical section) and these are created by adding a new SYNC_OBJECT_ACCESS_SERVER_ONLY access flag to the request.
The shared portion of the object is 64-bits:
* 32-bits of data, * 4-bits of flags, and * a 28-bit FNV1a hash
The 28 bits of hash are verified with every operation to detect if the shared memory portion of the object has been incorrectly modified. Although not infallible, the chances are somewhere around one in a million that an incorrect modification would not be detected.
Other Solutions
I should note that the addition of SGX extensions, although intended for cryptography and sensitive data, might provide a superior solution to optimizing synchronization objects as well as many other operations that require a server call. An SGX enclave provides a mechanism to share memory in userspace while restricting, at the CPU level, what code is allowed to read or alter that memory. This could be exploited to write a sort-of Wine Demi-Kernel that lives in the process space of each client process, reading and manipulating memory shared from wineserver with exceptional integrity guarantees.
Thanks, Daniel