This introduces a faster implementation of signal and wait operations on NT events, semaphores, and mutexes, which improves performance to native levels for a wide variety of games and other applications.
The goal here is similar to the long-standing out-of-tree "esync" and "fsync" patch sets, but without the flaws that make those patch sets not upstreamable.
The Linux "ntsync" driver is not currently released. It has been accepted into the trunk Linux tree for 6.14, so barring any extraordinary circumstances, the API is frozen and it will be released in its current form in about 2 months. Since it has passed all relevant reviewers on the kernel side, and the API is all but released, it seems there is no reason any more not to submit the Wine side to match.
Some important notes:
* This patch set does *not* include any way to disable ntsync support, since that kind of configuration often seems to be dispreferred where not necessary. In essence, ntsync should just work everywhere.
Probably the easiest way to effectively disable ntsync, for the purposes of testing, is to chmod the /dev/ntsync device to prevent its being opened. Regardless, a Wine switch to disable ntsync can be added simply enough. Note that it should probably not take the form of a registry key, however, since it needs to be easily accessible from the server itself.
* It is, generally speaking, not possible for only some objects, or some processes, to have backing ntsync objects, while others use the old server path. The esync/fsync patch sets explicitly protected against this by making sure every process had a consistent view of whether esync was enabled. This is not provided here, since no switch is provided to toggle ntsync, and it should not be possible to get into such an inconsistent state without gross misconfiguration.
* Similarly, no diagnostic messages are provided to note that ntsync is in use, or not in use. These messages are part of esync/fsync, as well as part of ntsync "testing" trees unofficially distributed. However, if ntsync is working correctly, no message should be necessary.
The basic structure is:
* Each type of server object which can be waited on by the client (including events, semaphores, mutexes, but also other types such as processes, files) must store an "inproc_sync" object.
This "inproc_sync" object is a full server object (note that this differs from esync/fsync). A vector and server request is introduced to retrieve an NT handle to this object from an arbitrary NT handle.
Since the actual ntsync objects are simply distinct file descriptions, we then call get_handle_fd from the client to retrieve an fd to the object, and then perform ioctls on it.
* Objects signaled by the server (processes, files, etc) perform ntsync ioctls on that object. The backing object in all such cases is simply an event.
* Signal and wait operations on the client side attempt to defer to an "inproc_*" function, falling back to the server implementation if it returns STATUS_NOT_IMPLEMENTED. This mirrors how in-process synchronization objects (critical sections, SRW locks, etc) used to be implemented—attempting to use an architecture-specific "fast_*" function and falling back if it returned STATUS_NOT_IMPLEMENTED.
* The inproc_sync handles, once retrieved, are cached per-process. This caching takes a similar form to the fd cache. It does not reuse the same infrastructure, however.
The primary reason for this is that the fd cache is designed to fit within a 64-bit value and uses 64-bit atomic operations to ensure consistency. However, we need to store more than 64 bits of information. [We also need to modify them after caching, in order to correctly implement handle closing—see below.]
The secondary reason is that retrieving the ntsync fd from the inproc_sync handle itself uses the fd cache.
* In order to keep the Linux driver simple, it does not implement access flags (EVENT_MODIFY_STATE etc.) Instead, the flags are cached locally and validated there. This too mirrors the fd cache. Note that this means that a malicious process can now modify objects it should not be able modify—which is less true than it is with wineserver—but this is no different from the way other objects (notably fds) are handled, and would require manual syscalls.
* In order to achieve correct behaviour related to closing objects while they are used, this patch set essentially relies on refcounting. This is broadly true of the server as well, but because we need to avoid server calls when performing object operations, significantly more care must be taken.
In particular, because waits need to be interruptable by signals and then be restarted, we need the backing ntsync object to remain valid until all users are done with it. On a process level, this is achieved by letting multiple processes own handles to the underlying inproc_sync server object.
On a thread level, multiple simultaneous calls need to refcount the process's local handle. This refcount is stored in the sync object cache. When it reaches zero, the cache is cleared.
Punting this behaviour to the Linux driver would have introduced a great deal more complexity, which is best kept in userspace and out of the kernel.
* The cache is, as such, treated as a cache. The penultimate commit, which introduces client support but does not yet cache the objects, effectively illustrates this by never actually caching anything, and retrieving a new NT handle and fd every time.
* Certain waits, on internal handles (such as async, startup_info, completion), are delegated to the server even when ntsync is used. Those server objects do not create an underlying ntsync object.
-- v2: ntdll: Cache in-process synchronization objects. ntdll: Use server_wait_for_object() when waiting on only the queue object. ntdll: Use in-process synchronization objects. ntdll: Add some traces to synchronization methods. ntdll: Introduce a helper to wait on an internal server handle. server: Allow creating an event object for client-side user APC signaling. server: Introduce select_inproc_queue and unselect_inproc_queue requests. server: Add a request to retrieve the in-process synchronization object from a handle. server: Add a request to retrieve the in-process synchronization device. server: Create in-process synchronization objects for fd-based objects. server: Create in-process synchronization objects for timers. server: Create in-process synchronization objects for threads. server: Create in-process synchronization objects for message queues. server: Create in-process synchronization objects for jobs. server: Create in-process synchronization objects for processes. server: Create in-process synchronization objects for keyed events. server: Create in-process synchronization objects for device managers. server: Create in-process synchronization objects for debug objects. server: Create in-process synchronization objects for console servers. server: Create in-process synchronization objects for consoles. server: Create in-process synchronization objects for completion ports. server: Create in-process synchronization objects for mutexes. server: Create in-process synchronization objects for semaphores. server: Create in-process synchronization objects for events. server: Add an object operation to retrieve an in-process synchronization object.
This merge request has too many patches to be relayed via email. Please visit the URL below to see the contents of the merge request. https://gitlab.winehq.org/wine/wine/-/merge_requests/7226
On Wed Feb 5 20:51:29 2025 +0000, Elizabeth Figura wrote:
changed this line in [version 2 of the diff](/wine/wine/-/merge_requests/7226/diffs?diff_id=155813&start_sha=464f1c5b82ab0c7b286b8e2c48de39a4be427fac#5aa8bdb8da21d88383ec724cb02513ed0a0fcfe0_142_142)
It does, thanks.
On Wed Feb 5 16:48:41 2025 +0000, Rémi Bernon wrote:
The enum doesn't seem to be used in this patch. For cleanliness purposes, maybe it should also add the enum values as they are being used.
Yeah, this is probably either historical artifact, or an attempt to avoid a bunch of protocol changes (which are annoying to rebase around). Fixed in v2.
On Wed Feb 5 16:48:41 2025 +0000, Rémi Bernon wrote:
The static `linux_device_object` doesn't seem to be used? `get_linux_device` is called below and the device is then released in every code path?
Sorry, I don't understand? In e.g. create_inproc_event() we retrieve the fd from it and use that to call NTSYNC_IOC_CREATE_EVENT.
If you're asking why it's a whole server object already, that's because we'll need the client to be able to retrieve the same fd later. That could do with at least an explanation in the patch subject, so I'll add one.
On Wed Feb 5 16:48:42 2025 +0000, Rémi Bernon wrote:
What about naming it that way right when it is introduced?
Yeah, that was probably a rebasing error, fixed in v2, thanks.
On Wed Feb 5 20:51:29 2025 +0000, Elizabeth Figura wrote:
changed this line in [version 2 of the diff](/wine/wine/-/merge_requests/7226/diffs?diff_id=155813&start_sha=464f1c5b82ab0c7b286b8e2c48de39a4be427fac)
Fixed this and other stylistic comments, thanks.
The latter. The state is not kept in sync, and fundamentally cannot be, and because waits are vectored this cascades. Believe me, I tried, if there was a way to pull it off we wouldn't need the kernel support in the first place, because we could just do in-process waits using futex or something, until we ran into a handle that had been shared cross-process.
Ok, in that case I don't really see the point of introducing each inproc object one after another, it's all dead code anyway so better do it at once.
I still think that changes could be reordered to avoid too much of it and get things in place and called earlier: the linux device object could be introduced first, then retrieved and cached from the clients, and then inproc objects could be added at once, retrieved, and then used?
Rémi Bernon (@rbernon) commented about server/event.c:
return &event->kernel_object;
}
+static struct inproc_sync *event_get_inproc_sync( struct object *obj ) +{
- struct event *event = (struct event *)obj;
- if (!event->inproc_sync)
- {
enum inproc_sync_type type = event->manual_reset ? INPROC_SYNC_MANUAL_EVENT : INPROC_SYNC_AUTO_EVENT;
event->inproc_sync = create_inproc_event( type, event->signaled );
What about creating the inproc objects on creation instead? This would avoid all these lazy initializations.
On Wed Feb 5 20:51:39 2025 +0000, Elizabeth Figura wrote:
Sorry, I don't understand? In e.g. create_inproc_event() we retrieve the fd from it and use that to call NTSYNC_IOC_CREATE_EVENT. If you're asking why it's a whole server object already, that's because we'll need the client to be able to retrieve the same fd later. That could do with at least an explanation in the patch subject, so I'll add one.
In create_inproc_event, you call `get_linux_device`, which creates a new linux_device object, setting the static, then the object is released and as it's the only reference it's also destroyed and the static is cleared. The static is therefore not useful at this point.
On Wed Feb 5 20:23:32 2025 +0000, Elizabeth Figura wrote:
When ntsync is used, no waits go to server_select [except a handful of cherry-picked internal waits that go through server_wait_for_object()]. Hence the state of the mutex object in the server [as opposed to the kernel] is never changed after creation. Rather, the internal state of the kernel object is mutable, and when NtReleaseMutant() is called that does NTSYNC_IOC_MUTEX_UNLOCK on the underlying ntsync object which changes its state instead. Similarly for waits. Same for events and semaphores. Other objects (timers, processes, etc.) still change state like normal, and the server alters their state using ioctls when anything happens that would change the object's signaled state.
Okay, this is not obvious from the code, maybe it could be made more obvious that the inproc_sync and the server-side state are mutually exclusive. Maybe a union of the server state members with the inproc pointer?
Also, creating the inproc_sync on object creation instead of using lazy init would make it more obvious that in this particular case the owner that is passed to the kernel is always the initial mutex owner.
Btw I see that this is true for every other object type, where the lazy init makes it non-obvious that the values passed to every inproc sync creation are the initial state values.
Rémi Bernon (@rbernon) commented about server/mutex.c:
{
struct mutex *mutex; struct list *ptr;
while ((ptr = list_head( &thread->mutex_list )) != NULL) {
struct mutex *mutex = LIST_ENTRY( ptr, struct mutex, entry );
}mutex = LIST_ENTRY( ptr, struct mutex, entry ); assert( mutex->owner == thread ); mutex->count = 0; mutex->abandoned = 1; do_release( mutex );
- LIST_FOR_EACH_ENTRY(mutex, &inproc_mutexes, struct mutex, inproc_mutexes_entry)
abandon_inproc_mutex( thread->id, mutex->inproc_sync );
So every time a thread dies, we now have to go through every possible mutex because we don't track which ones it owns anymore. Is this an issue? (I don't know and idk if we can do anything about it, but making sure we're aware of this)
Rémi Bernon (@rbernon) commented about server/mutex.c:
static void mutex_destroy( struct object *obj ) { struct mutex *mutex = (struct mutex *)obj; assert( obj->ops == &mutex_ops );
- if (!mutex->count) return;
- if (mutex->count)
- { mutex->count = 0; do_release( mutex ); }
- if (mutex->inproc_sync)
- {
release_object( mutex->inproc_sync );
list_remove( &mutex->inproc_mutexes_entry );
- }
Related to the other comment, if both cases are exclusive I think it should be more obvious.
Rémi Bernon (@rbernon) commented about server/device.c:
irp->thread = thread ? (struct thread *)grab_object( thread ) : NULL; if (irp->file) list_add_tail( &irp->file->requests, &irp->dev_entry ); list_add_tail( &manager->requests, &irp->mgr_entry );
- if (list_head( &manager->requests ) == &irp->mgr_entry) wake_up( &manager->obj, 0 ); /* first one */
- if (list_head( &manager->requests ) == &irp->mgr_entry)
- {
/* first one */
wake_up( &manager->obj, 0 );
set_inproc_event( manager->inproc_sync );
What about passing the inproc sync to wake_up, and setting it there instead of having to append a call after every wake_up calls?
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
{
if (!(ret = wine_server_call( req ))) handle = wine_server_ptr_handle( reply->handle );
}
SERVER_END_REQ;
if (!ret)
{
if (!server_get_unix_fd( handle, 0, &fd, &needs_close, NULL, NULL ))
{
if (InterlockedCompareExchange( &device, fd, -2 ) != -2)
{
/* someone beat us to it */
if (needs_close) close( fd );
NtClose( handle );
}
/* otherwise don't close the device */
This effectively leaks the handle? We have the fd cached, why do we care about the handle? What about something like that:
``` if (ret || server_get_unix_fd( handle, 0, &fd, &needs_close, NULL, NULL )) fd = -1; if (InterlockedCompareExchange( &device, fd, -2 ) != -2 && fd != -1 && needs_close) close( fd ); NtClose( handle ); ```
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
- prevent other threads from writing to it concurrently.
- It's possible for an object currently in use (by a waiter) to be closed and
- the same handle immediately reallocated to a different object. This should be
- a very rare situation, and in that case we simply don't cache the handle.
- */
+struct inproc_sync_cache_entry +{
- LONG refcount;
- int fd;
- enum inproc_sync_type type;
- unsigned int access;
- BOOL closed;
- /* handle to the underlying in-process sync object, stored as obj_handle_t
* to save space */
- obj_handle_t handle;
Same here, why do we care about the handle?
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
- if ((cache->access & desired_access) != desired_access)
- {
release_inproc_sync_obj( cache );
return STATUS_ACCESS_DENIED;
- }
- return STATUS_SUCCESS;
+}
+static NTSTATUS linux_release_semaphore_obj( int obj, ULONG count, ULONG *prev_count ) +{
- NTSTATUS ret;
- ret = ioctl( obj, NTSYNC_IOC_SEM_RELEASE, &count );
- if (ret < 0)
ioctl returns an int, not a NTSTATUS, it's also not very much used, what about getting rid of it (here and elsewhere below):
```suggestion:-1+0 if (ioctl( obj, NTSYNC_IOC_SEM_RELEASE, &count ) < 0) ```
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
- return STATUS_SUCCESS;
+}
+static NTSTATUS linux_release_semaphore_obj( int obj, ULONG count, ULONG *prev_count ) +{
- NTSTATUS ret;
- ret = ioctl( obj, NTSYNC_IOC_SEM_RELEASE, &count );
- if (ret < 0)
- {
if (errno == EOVERFLOW)
return STATUS_SEMAPHORE_LIMIT_EXCEEDED;
else
return errno_to_status( errno );
The else is unnecessary after return, and I think these short ifs should be kept one-liner as there are other instance of one-liners in the same function (and file).
```suggestion:-3+0 if (errno == EOVERFLOW) return STATUS_SEMAPHORE_LIMIT_EXCEEDED; return errno_to_status( errno ); ```
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
cache->handle = reply->handle;
cache->access = reply->access;
cache->type = reply->type;
cache->refcount = 1;
cache->closed = FALSE;
}
- }
- SERVER_END_REQ;
- if (ret) return ret;
- if ((ret = server_get_unix_fd( wine_server_ptr_handle( cache->handle ),
0, &cache->fd, &needs_close, NULL, NULL )))
return ret;
- if (desired_type && !inproc_sync_types_match( cache->type, desired_type ))
As you've changed the `enum inproc_sync_type` definition, INPROC_SYNC_AUTO_EVENT is now 0, this will break this check I think? Maybe add a specific INPROC_SYNC_UNKNOWN = 0 enum value?
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
+static NTSTATUS linux_release_mutex_obj( int obj, LONG *prev_count ) +{
- struct ntsync_mutex_args args = {0};
- NTSTATUS ret;
- args.owner = GetCurrentThreadId();
- ret = ioctl( obj, NTSYNC_IOC_MUTEX_UNLOCK, &args );
- if (ret < 0)
- {
if (errno == EOVERFLOW)
return STATUS_MUTANT_LIMIT_EXCEEDED;
else if (errno == EPERM)
return STATUS_MUTANT_NOT_OWNED;
else
return errno_to_status( errno );
```suggestion:-5+0 if (errno == EOVERFLOW) return STATUS_MUTANT_LIMIT_EXCEEDED; if (errno == EPERM) return STATUS_MUTANT_NOT_OWNED; return errno_to_status( errno ); ```
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
- struct ntsync_mutex_args args = {0};
- NTSTATUS ret;
- ret = ioctl( obj, NTSYNC_IOC_MUTEX_READ, &args );
- if (ret < 0)
- {
if (errno == EOWNERDEAD)
{
info->AbandonedState = TRUE;
info->OwnedByCaller = FALSE;
info->CurrentCount = 1;
return STATUS_SUCCESS;
}
else
return errno_to_status( errno );
```suggestion:-2+0 } return errno_to_status( errno ); ```
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
{
static const LARGE_INTEGER timeout;
ret = server_wait( NULL, 0, SELECT_INTERRUPTIBLE | SELECT_ALERTABLE, &timeout );
assert( ret == STATUS_USER_APC );
return ret;
}
return wait_any ? args.index : 0;
- }
- else if (errno == EOWNERDEAD)
return STATUS_ABANDONED + (wait_any ? args.index : 0);
- else if (errno == ETIMEDOUT)
return STATUS_TIMEOUT;
- else
return errno_to_status( errno );
```suggestion:-6+0 } if (errno == EOWNERDEAD) return STATUS_ABANDONED + (wait_any ? args.index : 0); if (errno == ETIMEDOUT) return STATUS_TIMEOUT; return errno_to_status( errno ); ```
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
{
for (j = 0; j < i; ++j)
release_inproc_sync_obj( cache[j] );
return ret;
}
if (cache[i]->type == INPROC_SYNC_QUEUE)
queue = handles[i];
objs[i] = cache[i]->fd;
- }
- if (queue) select_queue();
- ret = linux_wait_objs( device, count, objs, wait_any, alertable, timeout );
- if (queue) unselect_queue( handles[ret] == queue );
So, when message queue is involved, instead of one wineserver request we now have two. This doesn't look ideal?
Rémi Bernon (@rbernon) commented about dlls/ntdll/unix/sync.c:
+{
- struct inproc_sync_cache_entry signal_stack_cache, *signal_cache;
- struct inproc_sync_cache_entry wait_stack_cache, *wait_cache;
- HANDLE queue = NULL;
- NTSTATUS ret;
- int device;
- if ((device = get_linux_sync_device()) < 0)
return STATUS_NOT_IMPLEMENTED;
- if ((ret = get_inproc_sync_obj( signal, 0, 0, &signal_stack_cache, &signal_cache )))
return ret;
- switch (signal_cache->type)
- {
case INPROC_SYNC_SEMAPHORE:
I don't think case are indented in ntdll.
On Thu Feb 6 12:19:18 2025 +0000, Rémi Bernon wrote:
In create_inproc_event, you call `get_linux_device`, which creates a new linux_device object, setting the static, then the object is released and as it's the only reference it's also destroyed and the static is cleared. The static is therefore not useful at this point.
Although honestly I'm not sure if it's worth having an object for the linux device at all, if it's just about a `open( "/dev/ntsync", O_CLOEXEC | O_RDONLY )` call maybe clients could do it themselves and just keep a static fd.
On Thu Feb 6 12:29:26 2025 +0000, Rémi Bernon wrote:
Although honestly I'm not sure if it's worth having an object for the linux device at all, if it's just about a `open( "/dev/ntsync", O_CLOEXEC | O_RDONLY )` call maybe clients could do it themselves and just keep a static fd.
Nope, needs to be duplicated.
https://docs.kernel.org/next/userspace-api/ntsync.html#char-device
Each file description opened on the device represents a unique instance intended to back an individual NT virtual machine. Objects created by one ntsync instance may only be used with other objects created by the same instance.
On Thu Feb 6 12:31:02 2025 +0000, Alfred Agrell wrote:
Nope, needs to be duplicated. https://docs.kernel.org/next/userspace-api/ntsync.html#char-device
Each file description opened on the device represents a unique
instance intended to back an individual NT virtual machine. Objects created by one ntsync instance may only be used with other objects created by the same instance.
I see, well one more reason to introduce it first and cache it in clients then because otherwise it's incorrect until it is kept referenced.
And if it needs to stay alive for all prefix lifetime, instead of relying on clients leaking their handle, the object should probably be created on wineserver startup and be made permanent.
but I'd like to be sure since it would be quite some effort.
Just a side-note, since it seems not to be common knowledge: renaming over 25 commits is actually quick. The trick here is to `git format-patch -25 --stdout > /tmp/1.patch`, then regexp-replace strings over the file with you preferred way *(a text editor, `sed`, etc. But I recommend a text editor, just in case if you screw something up so patch won't apply anymore, you can <kbd>Ctrl</kbd>+<kbd>z</kbd> it and retry again)*, then `git reset --hard HEAD~25` and `git am -3 /tmp/1.patch`.
On Thu Feb 6 12:19:20 2025 +0000, Rémi Bernon wrote:
So, when message queue is involved, instead of one wineserver request we now have two. This doesn't look ideal?
An idea to save at least one of them would be to have an additional event fd that the server could poll to get notified when clients begin inproc waiting on queue, replacing the `select_inproc_queue` request in an asynchronous way. The other request seem trickier.
On Thu Feb 6 12:50:41 2025 +0000, Konstantin Kharlamov wrote:
but I'd like to be sure since it would be quite some effort.
Just a side-note, since it seems not to be common knowledge: renaming over 25 commits is actually quick. The trick here is to `git format-patch -25 --stdout > /tmp/1.patch`, then regexp-replace strings over the file with you preferred way *(a text editor, `sed`, etc. But I recommend a text editor, just in case if you screw something up so patch won't apply anymore, you can <kbd>Ctrl</kbd>+<kbd>z</kbd> it and retry)*, then `git reset --hard HEAD~25` and `git am -3 /tmp/1.patch`. You can also do various more involved edits over the patch file if you editor supports updating the hunks. I do that from time to time, `emacs` I'm using does recalculate hunks on save. But just for rename purposes this isn't needed.
Except regexp replacing isn't always trivial, and you also need to worry about reformatting.
On Thu Feb 6 12:19:18 2025 +0000, Rémi Bernon wrote:
Okay, this is not obvious from the code, maybe it could be made more obvious that the inproc_sync and the server-side state are mutually exclusive. Maybe a union of the server state members with the inproc pointer? Also, creating the inproc_sync on object creation instead of using lazy init would make it more obvious that in this particular case the owner that is passed to the kernel is always the initial mutex owner. Btw I see that this is true for every other object type, where the lazy init makes it non-obvious that the values passed to every inproc sync creation are the initial state values.
It was done this way to avoid allocating extra memory or fds for objects which are never used from the client side. You could limit it to event/semaphore/mutex, which are useless otherwise, but even those can often be created without ever being used (in fact, I believe one of the first applications to leak hundreds of thousands of objects did this.) I'm not sure that always creating an inproc_sync object and its backing ntsync fd would be a worthwhile improvement.
I still think that changes could be reordered to avoid too much of it and get things in place and called earlier: the linux device object could be introduced first, then retrieved and cached from the clients, and then inproc objects could be added at once, retrieved, and then used?
What kind of organization are you imagining? Where would we cache the device object from?
So every time a thread dies, we now have to go through every possible mutex because we don't track which ones it owns anymore. Is this an issue? (I don't know and idk if we can do anything about it, but making sure we're aware of this)
It hasn't been a problem in any testing I've seen.
On Thu Feb 6 12:19:19 2025 +0000, Rémi Bernon wrote:
What about passing the inproc sync to wake_up, and setting it there instead of having to append a call after every wake_up calls?
I think that's doable, but it would defeat lazy initialization.
On Thu Feb 6 12:19:19 2025 +0000, Rémi Bernon wrote:
This effectively leaks the handle? We have the fd cached, why do we care about the handle? What about something like that:
if (ret || server_get_unix_fd( handle, 0, &fd, &needs_close, NULL, NULL )) fd = -1; if (InterlockedCompareExchange( &device, fd, -2 ) != -2 && fd != -1 && needs_close) close( fd ); NtClose( handle );
That's not really legal unless you make assumptions about the fd cache. Generally speaking, closing a handle may also close its cached fd.
This *would* work since we don't mark the inproc sync handles cacheable, but there's not actually any reason they can't be cacheable, and making that assumption feels uncomfortable.
So, when message queue is involved, instead of one wineserver request we now have two. This doesn't look ideal?
Correct. Avoiding this is hard in general, and it's never come up as enough of a bottleneck to need changing.
Note that if it's the only object we're waiting on (which is many cases, e.g. GetMessage, WaitMessage) then we can delegate to the server, i.e. "ntdll: Use server_wait_for_object() when waiting on only the queue object."
On Thu Feb 6 17:32:05 2025 +0000, Elizabeth Figura wrote:
It was done this way to avoid allocating extra memory or fds for objects which are never used from the client side. You could limit it to event/semaphore/mutex, which are useless otherwise, but even those can often be created without ever being used (in fact, I believe one of the first applications to leak hundreds of thousands of objects did this.) I'm not sure that always creating an inproc_sync object and its backing ntsync fd would be a worthwhile improvement.
I think the lazy initialization makes the code very non-obvious where the state is being kept, and reading the code it's as if wineserver could begin using the objects then only later delegating things to the kernel, which, according to your description elsewhere is not going to work.
On Thu Feb 6 17:36:21 2025 +0000, Elizabeth Figura wrote:
I still think that changes could be reordered to avoid too much of it
and get things in place and called earlier: the linux device object could be introduced first, then retrieved and cached from the clients, and then inproc objects could be added at once, retrieved, and then used? What kind of organization are you imagining? Where would we cache the device object from?
Idk, introduce the redirection from https://gitlab.winehq.org/wine/wine/-/merge_requests/7226/diffs?commit_id=b3... early, returning STATUS_NOT_IMPLEMENTED at first, then get and cache the linux device when any gets called?
On Thu Feb 6 17:47:03 2025 +0000, Elizabeth Figura wrote:
That's not really legal unless you make assumptions about the fd cache. Generally speaking, closing a handle may also close its cached fd. This *would* work since we don't mark the inproc sync handles cacheable, but there's not actually any reason they can't be cacheable, and making that assumption feels uncomfortable.
It feels weird to have two caches caching the same things. We should IMO decide that these fds are never cached, or decide that they might be cached, then you could dup the fd if necessary and still close the handle, it'll save you space (this seems to be an important aspect according to the handle field comment) and code.
On Fri Feb 7 10:14:32 2025 +0000, Rémi Bernon wrote:
It feels weird to have two caches caching the same things. We should IMO decide that these fds are never cached, or decide that they might be cached, then you could dup the fd if necessary and still close the handle, it'll save you space (this seems to be an important aspect according to the handle field comment) and code.
Yeah, that's true enough, we don't need to cache the fd here.
I think it's this way as a relic of an earlier revision in which the objects weren't individual fds but rather lived in a distinct ID namespace. I'll revise it to use the fd cache instead.
On Fri Feb 7 10:10:35 2025 +0000, Rémi Bernon wrote:
Idk, introduce the redirection from https://gitlab.winehq.org/wine/wine/-/merge_requests/7226/diffs?commit_id=b3... early, returning STATUS_NOT_IMPLEMENTED at first, then get and cache the linux device when any gets called?
Okay, I see, that's doable enough.
On Fri Feb 7 10:08:21 2025 +0000, Rémi Bernon wrote:
I think the lazy initialization makes the code very non-obvious where the state is being kept, and reading the code it's as if wineserver could begin using the objects then only later delegating things to the kernel, which, according to your description elsewhere is not going to work.
That's a valid concern, but I'm more than a little hesitant to *not* lazily initialize just because of that. I'd rather just add a comment explaining why we lazily create and that the state is identical to what it is at creation time.