`mach_continuous_approximate_time()` has the necessary precision for win32 ticks and can be up to 4x faster than `mach_continuous_time()`.
Also `clock_gettime( CLOCK_REALTIME, &ts )` calls always end up in `__commpage_gettimeofday( struct timeval *tp )`: ``` * frame #0: 0x00007ff806788763 libsystem_kernel.dylib`__commpage_gettimeofday frame #1: 0x00007ff8066709a3 libsystem_c.dylib`gettimeofday + 45 frame #2: 0x00007ff806678b31 libsystem_c.dylib`clock_gettime + 117 ``` These extra calls, setup and converting from one struct format to another costs another 60 CPU cycles and in my testing makes `NtQuerySystemTime` approximately 30% faster as well with this MR. This is a fairly hot code path (especially when using certain out-of-tree in process synchronization patch sets), so probably worth the optimization here.
All of these APIs are available since 10.12.
From: Marc-Aurel Zent mzent@codeweavers.com
mach_continuous_approximate_time() has the necessary precision for win32 ticks and can be up tp 4x faster than mach_continuous_time(). --- dlls/ntdll/unix/sync.c | 2 +- server/request.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/dlls/ntdll/unix/sync.c b/dlls/ntdll/unix/sync.c index 2b01aaf83b8..21f194c9b35 100644 --- a/dlls/ntdll/unix/sync.c +++ b/dlls/ntdll/unix/sync.c @@ -85,7 +85,7 @@ static inline ULONGLONG monotonic_counter(void) static mach_timebase_info_data_t timebase;
if (!timebase.denom) mach_timebase_info( &timebase ); - return mach_continuous_time() * timebase.numer / timebase.denom / 100; + return mach_continuous_approximate_time() * timebase.numer / timebase.denom / 100; #elif defined(HAVE_CLOCK_GETTIME) struct timespec ts; #ifdef CLOCK_MONOTONIC_RAW diff --git a/server/request.c b/server/request.c index 2254315b79e..c91b718c011 100644 --- a/server/request.c +++ b/server/request.c @@ -511,7 +511,7 @@ timeout_t monotonic_counter(void) static mach_timebase_info_data_t timebase;
if (!timebase.denom) mach_timebase_info( &timebase ); - return mach_continuous_time() * timebase.numer / timebase.denom / 100; + return mach_continuous_approximate_time() * timebase.numer / timebase.denom / 100; #elif defined(HAVE_CLOCK_GETTIME) struct timespec ts; #ifdef CLOCK_MONOTONIC_RAW
From: Marc-Aurel Zent mzent@codeweavers.com
--- dlls/ntdll/unix/sync.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/dlls/ntdll/unix/sync.c b/dlls/ntdll/unix/sync.c index 21f194c9b35..85993c3b32b 100644 --- a/dlls/ntdll/unix/sync.c +++ b/dlls/ntdll/unix/sync.c @@ -1704,7 +1704,13 @@ NTSTATUS WINAPI NtQueryPerformanceCounter( LARGE_INTEGER *counter, LARGE_INTEGER */ NTSTATUS WINAPI NtQuerySystemTime( LARGE_INTEGER *time ) { -#ifdef HAVE_CLOCK_GETTIME +#ifdef __APPLE__ + extern int __commpage_gettimeofday( struct timeval *tp ); + struct timeval tp; + if (__commpage_gettimeofday( &tp ) == KERN_SUCCESS) + time->QuadPart = ticks_from_time_t( tp.tv_sec ) + tp.tv_usec * 10; + else +#elif defined(HAVE_CLOCK_GETTIME) struct timespec ts; static clockid_t clock_id = CLOCK_MONOTONIC; /* placeholder */
Are there guarantees on the precision of mach_continuous_approximate_time? NtQuerySystemTime theoretically guarantees 100 nanoseconds.
I have a slight concern with using internals like __commpage_gettimeofday, but it has been in the released xnu source since at least 10.12, and it was [used by qemu when they had separate darwin support](https://github.com/qemu/qemu/blob/1c8a881daaca6fe0646a425b0970fb3ad25f6732/d...) way back in 2011, so it was presumably around even for 10.6/10.7. So, given the performance impact, I'm in support.
Tim Clem (@tclem) commented about dlls/ntdll/unix/sync.c:
*/ NTSTATUS WINAPI NtQuerySystemTime( LARGE_INTEGER *time ) { -#ifdef HAVE_CLOCK_GETTIME +#ifdef __APPLE__
- extern int __commpage_gettimeofday( struct timeval *tp );
I think this calls for a quick comment (or at least a note in the commit message) as to why we're using this function.
Brendan Shanks (@bshanks) commented about dlls/ntdll/unix/sync.c:
*/ NTSTATUS WINAPI NtQuerySystemTime( LARGE_INTEGER *time ) { -#ifdef HAVE_CLOCK_GETTIME
Wouldn't it be essentially the same to call `gettimeofday()` instead of `__commpage_gettimeofday()`? This line could change to `#if defined(HAVE_CLOCK_GETTIME) && !defined(__APPLE__)`.
Also something else that could be fixed in a separate commit: for some reason we do `gettimeofday( &now, 0 );` in `sync.c`, but `tzp` is a pointer so NULL makes more sense.
Are there guarantees on the precision of mach_continuous_approximate_time? NtQuerySystemTime theoretically guarantees 100 nanoseconds.
It's capable of expressing 100 ns, but the actual precision of NtQuerySystemTime is the system timer, which is documented as being "approximately 10ms" [1] but in practice is more like 1 ms.
[1] https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-ke...
On Tue Feb 4 19:20:12 2025 +0000, Tim Clem wrote:
Are there guarantees on the precision of mach_continuous_approximate_time? NtQuerySystemTime theoretically guarantees 100 nanoseconds.
I looked around for this but couldn't find anything, [WebKit added support](https://github.com/WebKit/WebKit/commit/d581c36c26fce240b2258214d829a2ced062...) but just says that it has "coarse-grained resolution" and "we can use this when we do not care about high resolution".
On Tue Feb 4 19:20:12 2025 +0000, Brendan Shanks wrote:
I looked around for this but couldn't find anything, [WebKit added support](https://github.com/WebKit/WebKit/commit/d581c36c26fce240b2258214d829a2ced062...) but just says that it has "coarse-grained resolution" and "we can use this when we do not care about high resolution".
I did a bunch of tests on this a while ago and it is in general just lagging a bit behind `mach_continuous_time` with a bunch of jitter, due to more lax memory barriers IIRC.
In any case, the error was in the order of 100 nanoseconds at max (and much less when looking at the errors of the values relative to each other, assuming `mach_continuous_time` as ground truth). It was way more precision than needed for the 1 ms windows system timer precision.
But you are right in that there is almost nothing out there about the `*_aproximate_*` variants introduced wth 10.12, which is also probably due to Apple's (non-existent) documentation regarding it. They are part of the public API though, and probably going nowhere (hopefully).
I stumbled upon them when searching for the equivalent of `CLOCK_REALTIME_COARSE`.
On Tue Feb 4 22:01:19 2025 +0000, Tim Clem wrote:
I have a slight concern with using internals like __commpage_gettimeofday, but it has been in the released xnu source since at least 10.12, and it was [used by qemu when they had separate darwin support](https://github.com/qemu/qemu/blob/1c8a881daaca6fe0646a425b0970fb3ad25f6732/d...) way back in 2011, so it was presumably around even for 10.6/10.7. So, given the performance impact, I'm in support.
I was thinking about that too, but came to the conclusion that it is probably fine for the foreseeable future, given that Apple rewrote that part already once and it looks like so:
``` __attribute__((visibility("hidden"))) int __commpage_gettimeofday_internal(struct timeval *tp, uint64_t *tbr_out);
int __commpage_gettimeofday(struct timeval *tp) { return __commpage_gettimeofday_internal(tp, NULL); } ```
Since this is used a bunch directly by their projects (and a few external ones), they were probably motivated to keep the API stable and not break it then. My assumption is that will also hold for the future.
On Tue Feb 4 18:43:10 2025 +0000, Brendan Shanks wrote:
Wouldn't it be essentially the same to call `gettimeofday()` instead of `__commpage_gettimeofday()`? This line could change to `#if defined(HAVE_CLOCK_GETTIME) && !defined(__APPLE__)`.
Indeed, but this commit essentially inlines the call to `__commpage_gettimeofday()` inside `gettimeofday()`, wheres the latter still incurs of overhead of 23 CPU cycles before calling into `__commpage_gettimeofday()`, if I counted correctly.
It actually does more than inlining though, `gettimeofday()`, will do a syscall if `__commpage_gettimeofday()` fails, and use that value instead.
This can happen rarely (about 1 in a million in my testing), probably when page time values that are updated asynchronously from the kernel are read simultaneously.
As the commit is right now, it effectively tries to read from the commpage *twice*, making that chance of doing a syscall astronomically low.
Also it is still measurably faster to use `__commpage_gettimeofday()` directly (benchmarked at 10000000 iterations through Rosetta):
``` clock_gettime() Duration: 0.290506 seconds gettimeofday() Duration: 0.233024 seconds __commpage_gettimeofday() Duration: 0.217468 seconds ```