[PATCH 0/25] MR7226: ntdll: Implement in-process synchronization via the Linux "ntsync" driver. - wine-gitlab

29 Jan 2025


      This introduces a faster implementation of signal and wait operations on NT
events, semaphores, and mutexes, which improves performance to native levels for
a wide variety of games and other applications.
The goal here is similar to the long-standing out-of-tree "esync" and "fsync"
patch sets, but without the flaws that make those patch sets not upstreamable.
The Linux "ntsync" driver is not currently released. It has been accepted into
the trunk Linux tree for 6.14, so barring any extraordinary circumstances, the
API is frozen and it will be released in its current form in about 2 months.
Since it has passed all relevant reviewers on the kernel side, and the API is
all but released, it seems there is no reason any more not to submit the Wine
side to match.
Some important notes:
* This patch set does *not* include any way to disable ntsync support, since
  that kind of configuration often seems to be dispreferred where not necessary.
  In essence, ntsync should just work everywhere.
Probably the easiest way to effectively disable ntsync, for the purposes of
  testing, is to chmod the /dev/ntsync device to prevent its being opened.
  Regardless, a Wine switch to disable ntsync can be added simply enough. Note
  that it should probably not take the form of a registry key, however, since it
  needs to be easily accessible from the server itself.
* It is, generally speaking, not possible for only some objects, or some
  processes, to have backing ntsync objects, while others use the old server
  path. The esync/fsync patch sets explicitly protected against this by making
  sure every process had a consistent view of whether esync was enabled. This is
  not provided here, since no switch is provided to toggle ntsync, and it should
  not be possible to get into such an inconsistent state without gross
  misconfiguration.
* Similarly, no diagnostic messages are provided to note that ntsync is in use,
  or not in use. These messages are part of esync/fsync, as well as part of
  ntsync "testing" trees unofficially distributed. However, if ntsync is working
  correctly, no message should be necessary.
The basic structure is:
* Each type of server object which can be waited on by the client (including
  events, semaphores, mutexes, but also other types such as processes, files)
  must store an "inproc_sync" object.
This "inproc_sync" object is a full server object (note that this differs from
  esync/fsync). A vector and server request is introduced to retrieve an NT
  handle to this object from an arbitrary NT handle.
Since the actual ntsync objects are simply distinct file descriptions, we then
  call get_handle_fd from the client to retrieve an fd to the object, and then
  perform ioctls on it.
* Objects signaled by the server (processes, files, etc) perform ntsync ioctls
  on that object. The backing object in all such cases is simply an event.
* Signal and wait operations on the client side attempt to defer to an
  "inproc_*" function, falling back to the server implementation if it returns
  STATUS_NOT_IMPLEMENTED. This mirrors how in-process synchronization objects
  (critical sections, SRW locks, etc) used to be implemented—attempting to use
  an architecture-specific "fast_*" function and falling back if it returned
  STATUS_NOT_IMPLEMENTED.
* The inproc_sync handles, once retrieved, are cached per-process. This caching
  takes a similar form to the fd cache. It does not reuse the same
  infrastructure, however.
The primary reason for this is that the fd cache is designed to fit within a
  64-bit value and uses 64-bit atomic operations to ensure consistency. However,
  we need to store more than 64 bits of information. [We also need to modify
  them after caching, in order to correctly implement handle closing—see below.]
The secondary reason is that retrieving the ntsync fd from the inproc_sync
  handle itself uses the fd cache.
* In order to keep the Linux driver simple, it does not implement access flags
  (EVENT_MODIFY_STATE etc.) Instead, the flags are cached locally and validated
  there. This too mirrors the fd cache. Note that this means that a malicious
  process can now modify objects it should not be able modify—which is less true
  than it is with wineserver—but this is no different from the way other objects
  (notably fds) are handled, and would require manual syscalls.
* In order to achieve correct behaviour related to closing objects while they
  are used, this patch set essentially relies on refcounting. This is broadly
  true of the server as well, but because we need to avoid server calls when
  performing object operations, significantly more care must be taken.
In particular, because waits need to be interruptable by signals and then be
  restarted, we need the backing ntsync object to remain valid until all users
  are done with it. On a process level, this is achieved by letting multiple
  processes own handles to the underlying inproc_sync server object.
On a thread level, multiple simultaneous calls need to refcount the process's
  local handle. This refcount is stored in the sync object cache. When it
  reaches zero, the cache is cleared.
Punting this behaviour to the Linux driver would have introduced a great deal
  more complexity, which is best kept in userspace and out of the kernel.
* The cache is, as such, treated as a cache. The penultimate commit, which
  introduces client support but does not yet cache the objects, effectively
  illustrates this by never actually caching anything, and retrieving a new NT
  handle and fd every time.
* Certain waits, on internal handles (such as async, startup_info, completion),
  are delegated to the server even when ntsync is used. Those server objects do
  not create an underlying ntsync object.
-- 
This merge request has too many patches to be relayed via email.
Please visit the URL below to see the contents of the merge request.
https://gitlab.winehq.org/wine/wine/-/merge_requests/7226