https://bugs.winehq.org/show_bug.cgi?id=50292
Bug ID: 50292 Summary: Process-local synchronization objects use private interfaces into the Unix library Product: Wine Version: 6.0-rc1 Hardware: x86-64 OS: Linux Status: NEW Severity: normal Priority: P2 Component: ntdll Assignee: wine-bugs@winehq.org Reporter: z.figura12@gmail.com Distribution: ---
This is not exactly a bug, but it is something we'd like to change about the ntdll code, and we need somewhere to track https://github.com/wine-staging/wine-staging/tree/master/patches/ntdll-NtAlertThreadByThreadId. The basic "problem" is that process-local synchronization APIs [condition variables, SRW locks, critical sections, Win32 futexes] use unixlib vectors [fast_RtlpWaitForCriticalSection() et al.] instead of syscalls.
Some testing suggests that Win32 futexes and condition variables are implemented on top of an internal, undocumented interface using the functions NtAlertThreadByThreadId() and NtWaitForAlertByThreadId(), which are themselves presumably syscalls. In particular, a thread waiting inside RtlSleepConditionVariable*() or RtlWaitOnAddress() can be awoken with NtAlertThreadByThreadId(). Because this interface is meant specifically for process-local synchronization (in particular, it is not possible to alert another process's thread), it is a good fit for Wine, both for those objects as well as SRW locks and critical sections. Accordingly I have created a patch set which implements these Nt* interfaces, and reimplements the Rtl* APIs on top of them.
There is a problem with these patches, however. Some performance testing done by Etienne Juvigny reveals that Star Citizen, a game which makes heavy use of Win32 futexes, suffers a drop in performance from these patches. On one machine, configured to be maximally CPU-limited, performance drops from 92 to 80 FPS. There are two potential reasons for this:
(1) locking of TEB lists is slow, both on the PE-side and the Unix-side. This is especially likely given that performance suffered massively before the locks were converted to read-write locks. However, removing these locks entirely is difficult if not impossible.
(2) RtlWakeAddressAll() performs multiple syscalls (one per thread waiting on the given address). In Star Citizen, this can be up to six threads in normal usage patterns. This seems to be true on Windows as well: NtAlertThreadByThreadId() only seems capable of waking one thread at a time, and I can find no similar API that wakes more than one thread.