Clang presents as GCC 4.2.1, so it was always using the fallbacks for `InterlockedExchange{,Pointer}` on the non-PE side.
The fix for the GCC `__atomic_*` bug linked in the comment for the `__WINE_ATOMIC_*` macros first appeared in GCC 8: https://github.com/gcc-mirror/gcc/commit/d8c40eff56f69877b33c697ded756d50fde...
From: William Horvath william@horvath.blog
Clang presents as GCC 4.2.1, so it was always using the fallbacks for InterlockedExchange{,Pointer} on the non-PE side.
The fix for the GCC __atomic_* bug linked in the comment for the __WINE_ATOMIC_* macros first appeared in GCC 8: https://github.com/gcc-mirror/gcc/commit/d8c40eff56f69877b33c697ded756d50fde... --- include/winnt.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/winnt.h b/include/winnt.h index 2547f841b7c..2992e86622d 100644 --- a/include/winnt.h +++ b/include/winnt.h @@ -7189,7 +7189,7 @@ static FORCEINLINE LONGLONG WINAPI InterlockedCompareExchange64( LONGLONG volati static FORCEINLINE LONG WINAPI InterlockedExchange( LONG volatile *dest, LONG val ) { LONG ret; -#if (__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 7)) +#if defined(__clang__) || ((__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 7))) ret = __atomic_exchange_n( dest, val, __ATOMIC_SEQ_CST ); #elif defined(__i386__) || defined(__x86_64__) __asm__ __volatile__( "lock; xchgl %0,(%1)" @@ -7248,7 +7248,7 @@ static FORCEINLINE LONGLONG WINAPI InterlockedDecrement64( LONGLONG volatile *de static FORCEINLINE void * WINAPI InterlockedExchangePointer( void *volatile *dest, void *val ) { void *ret; -#if (__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 7)) +#if defined(__clang__) || ((__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 7))) ret = __atomic_exchange_n( dest, val, __ATOMIC_SEQ_CST ); #elif defined(__x86_64__) __asm__ __volatile__( "lock; xchgq %0,(%1)" : "=r" (ret) :"r" (dest), "0" (val) : "memory" ); @@ -7285,7 +7285,7 @@ static FORCEINLINE void MemoryBarrier(void) __sync_synchronize(); }
-#if defined(__x86_64__) || defined(__i386__) +#if !defined(__clang__) && ((__GNUC__ < 8) && (defined(__x86_64__) || defined(__i386__))) /* On x86, Support old GCC with either no or buggy (GCC BZ#81316) __atomic_* support */ #define __WINE_ATOMIC_LOAD_ACQUIRE(ptr, ret) do { *(ret) = *(ptr); __asm__ __volatile__( "" ::: "memory" ); } while (0) #define __WINE_ATOMIC_LOAD_RELAXED(ptr, ret) do { *(ret) = *(ptr); } while (0) @@ -7296,7 +7296,7 @@ static FORCEINLINE void MemoryBarrier(void) #define __WINE_ATOMIC_LOAD_RELAXED(ptr, ret) __atomic_load(ptr, ret, __ATOMIC_RELAXED) #define __WINE_ATOMIC_STORE_RELEASE(ptr, val) __atomic_store(ptr, val, __ATOMIC_RELEASE) #define __WINE_ATOMIC_STORE_RELAXED(ptr, val) __atomic_store(ptr, val, __ATOMIC_RELAXED) -#endif /* defined(__x86_64__) || defined(__i386__) */ +#endif /* !defined(__clang__) && ((__GNUC__ < 8) && (defined(__x86_64__) || defined(__i386__))) */
static FORCEINLINE LONG ReadAcquire( LONG const volatile *src ) {
Is there a known performance or correctness benefit for using the C11 atomic variants? Either a game with improved perf or an example of an enabled compiler optimization (reordering) would work.
We should also look into server/fd.c too, it's where I originally borrowed the pattern from.
Is there a known performance or correctness benefit for using the C11 atomic variants?
Not that I know of. I just think it makes sense to use them if they're available (and work correctly) instead of needing to write our own.
We should also look into server/fd.c...
`server/fd.c` defines `atomic_store_*(volatile *ptr, value)` as `*ptr = value` on x86/x86_64. I'm not sure why it defines its own set of atomic helpers, when it also has access to the macros in `include/winnt.h` indirectly.
On Thu Feb 6 17:31:08 2025 +0000, William Horvath wrote:
Is there a known performance or correctness benefit for using the C11
atomic variants? Not that I know of. I just think it makes sense to use them if they're available (and work correctly) instead of needing to write our own.
We should also look into server/fd.c...
`server/fd.c` defines `atomic_store_*(volatile *ptr, value)` as `*ptr = value` on x86/x86_64. I'm not sure why it defines its own set of atomic helpers, when it also has access to the macros in `include/winnt.h` indirectly.
Also, note that these aren't C11 atomics, which would be `atomic_store` and `atomic_load`, without the `__` prefix. Underscore prefixed atomics are compiler builtins.
On Thu Feb 6 18:10:41 2025 +0000, William Horvath wrote:
Also, note that these aren't C11 atomics, which would be `atomic_store` and `atomic_load`, without the `__` prefix. Underscore prefixed atomics are compiler builtins.
I'm not particularly opposed to the idea, but without a convincing case for it I'm not sure further modification will be well received. (I'm not a maintainer, so maybe ask for others' opinion.)
Maybe when we drop support for GCC <8, then `__sync_` variants as well as open-coded inline asms could be dropped at once, although we may still need a stronger case for it.
Note that the 32-bit atomics have been effectively tested for years. ReadNoFence64 was relatively new though.
This merge request was closed by William Horvath.