`mach_continuous_approximate_time()` has the necessary precision for win32 ticks and can be up to 4x faster than `mach_continuous_time()`.
Also `clock_gettime( CLOCK_REALTIME, &ts )` calls always end up in `__commpage_gettimeofday( struct timeval *tp )`: ``` * frame #0: 0x00007ff806788763 libsystem_kernel.dylib`__commpage_gettimeofday frame #1: 0x00007ff8066709a3 libsystem_c.dylib`gettimeofday + 45 frame #2: 0x00007ff806678b31 libsystem_c.dylib`clock_gettime + 117 ``` These extra calls, setup and converting from one struct format to another costs another 60 CPU cycles and in my testing makes `NtQuerySystemTime` approximately 30% faster as well with this MR. This is a fairly hot code path (especially when using certain out-of-tree in process synchronization patch sets), so probably worth the optimization here.
All of these APIs are available since 10.12.
-- v4: ntdll: Replace '0' with 'NULL' in gettimeofday() calls. ntdll: Use gettimeofday in system_time_precise on macOS. ntdll: Use __commpage_gettimeofday in NtQuerySystemTime on macOS. ntdll: Always use mach_continuous_approximate_time on macOS.
From: Marc-Aurel Zent mzent@codeweavers.com
mach_continuous_approximate_time() has the necessary precision for win32 ticks and can be up to 4x faster than mach_continuous_time(). --- dlls/ntdll/unix/sync.c | 2 +- server/request.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/dlls/ntdll/unix/sync.c b/dlls/ntdll/unix/sync.c index 2b01aaf83b8..21f194c9b35 100644 --- a/dlls/ntdll/unix/sync.c +++ b/dlls/ntdll/unix/sync.c @@ -85,7 +85,7 @@ static inline ULONGLONG monotonic_counter(void) static mach_timebase_info_data_t timebase;
if (!timebase.denom) mach_timebase_info( &timebase ); - return mach_continuous_time() * timebase.numer / timebase.denom / 100; + return mach_continuous_approximate_time() * timebase.numer / timebase.denom / 100; #elif defined(HAVE_CLOCK_GETTIME) struct timespec ts; #ifdef CLOCK_MONOTONIC_RAW diff --git a/server/request.c b/server/request.c index 2254315b79e..c91b718c011 100644 --- a/server/request.c +++ b/server/request.c @@ -511,7 +511,7 @@ timeout_t monotonic_counter(void) static mach_timebase_info_data_t timebase;
if (!timebase.denom) mach_timebase_info( &timebase ); - return mach_continuous_time() * timebase.numer / timebase.denom / 100; + return mach_continuous_approximate_time() * timebase.numer / timebase.denom / 100; #elif defined(HAVE_CLOCK_GETTIME) struct timespec ts; #ifdef CLOCK_MONOTONIC_RAW
From: Marc-Aurel Zent mzent@codeweavers.com
--- dlls/ntdll/unix/sync.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/dlls/ntdll/unix/sync.c b/dlls/ntdll/unix/sync.c index 21f194c9b35..3665b8b8626 100644 --- a/dlls/ntdll/unix/sync.c +++ b/dlls/ntdll/unix/sync.c @@ -1704,7 +1704,17 @@ NTSTATUS WINAPI NtQueryPerformanceCounter( LARGE_INTEGER *counter, LARGE_INTEGER */ NTSTATUS WINAPI NtQuerySystemTime( LARGE_INTEGER *time ) { -#ifdef HAVE_CLOCK_GETTIME +#ifdef __APPLE__ + /* On macOS clock_gettime() will eventually call into this, given a + * CLOCK_REALTIME clock_id. + * Similarly would gettimeofday(). For performance reasons this is directly + * linked against here. */ + extern int __commpage_gettimeofday( struct timeval *tp ) __attribute__((weak_import)); + struct timeval tp; + if (__commpage_gettimeofday != NULL && __commpage_gettimeofday( &tp ) == KERN_SUCCESS) + time->QuadPart = ticks_from_time_t( tp.tv_sec ) + tp.tv_usec * 10; + else +#elif defined(HAVE_CLOCK_GETTIME) struct timespec ts; static clockid_t clock_id = CLOCK_MONOTONIC; /* placeholder */
From: Marc-Aurel Zent mzent@codeweavers.com
--- dlls/ntdll/unix/sync.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/dlls/ntdll/unix/sync.c b/dlls/ntdll/unix/sync.c index 3665b8b8626..47c10e80b6b 100644 --- a/dlls/ntdll/unix/sync.c +++ b/dlls/ntdll/unix/sync.c @@ -1831,7 +1831,8 @@ NTSTATUS system_time_precise( void *args ) { LONGLONG *ret = args; struct timeval now; -#ifdef HAVE_CLOCK_GETTIME + /* Excluding macOS here for the reason outlined in NtQuerySystemTime */ +#if defined(HAVE_CLOCK_GETTIME) && !defined(__APPLE__) struct timespec ts;
if (!clock_gettime( CLOCK_REALTIME, &ts ))
From: Marc-Aurel Zent mzent@codeweavers.com
--- dlls/ntdll/unix/sync.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/dlls/ntdll/unix/sync.c b/dlls/ntdll/unix/sync.c index 47c10e80b6b..d20dc4eed13 100644 --- a/dlls/ntdll/unix/sync.c +++ b/dlls/ntdll/unix/sync.c @@ -95,7 +95,7 @@ static inline ULONGLONG monotonic_counter(void) if (!clock_gettime( CLOCK_MONOTONIC, &ts )) return ts.tv_sec * (ULONGLONG)TICKSPERSEC + ts.tv_nsec / 100; #endif - gettimeofday( &now, 0 ); + gettimeofday( &now, NULL ); return ticks_from_time_t( now.tv_sec ) + now.tv_usec * 10 - server_start_time; }
@@ -1740,7 +1740,7 @@ NTSTATUS WINAPI NtQuerySystemTime( LARGE_INTEGER *time ) { struct timeval now;
- gettimeofday( &now, 0 ); + gettimeofday( &now, NULL ); time->QuadPart = ticks_from_time_t( now.tv_sec ) + now.tv_usec * 10; } return STATUS_SUCCESS; @@ -1841,7 +1841,7 @@ NTSTATUS system_time_precise( void *args ) return STATUS_SUCCESS; } #endif - gettimeofday( &now, 0 ); + gettimeofday( &now, NULL ); *ret = ticks_from_time_t( now.tv_sec ) + now.tv_usec * 10; return STATUS_SUCCESS; }
On Wed Feb 5 19:08:55 2025 +0000, Brendan Shanks wrote:
IMO, this feels like a micro-optimization that isn't worth the complexity and risk to save 23 cycles per call. If it makes a meaningful difference in a game benchmark it could be worth it, but otherwise it feels better to chase bigger gains.
FWIW I did some more testing on this, and it is possible to reliably extract out some low single digit percentage improvements in frame times on the FFXIV benchmark when CPU bound. The difference between `gettimeofday()` and `__commpage_gettimeofday()` is very hard to measure in a real-world application, except maybe with an Instruments trace.
The difference is easier to measure between `clock_gettime()` and `__commpage_gettimeofday()` directly though, so I think this is probably still worth it.
I also weakly linked in `__commpage_gettimeofday()` in the latest version, mitigating any future possible risk as well.