From: Yuxuan Shui yshui@codeweavers.com
--- include/winnt.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/winnt.h b/include/winnt.h index ab14bcc80d5..27381fad3e0 100644 --- a/include/winnt.h +++ b/include/winnt.h @@ -7109,7 +7109,7 @@ static FORCEINLINE void MemoryBarrier(void) __sync_synchronize(); }
-#if defined(__x86_64__) || defined(__i386__) +#if (defined(__x86_64__) || defined(__i386__)) && !defined(__clang__) && __GNUC__ < 8 /* On x86, Support old GCC with either no or buggy (GCC BZ#81316) __atomic_* support */ #define __WINE_ATOMIC_LOAD_ACQUIRE(ptr, ret) do { *(ret) = *(ptr); __asm__ __volatile__( "" ::: "memory" ); } while (0) #define __WINE_ATOMIC_LOAD_RELAXED(ptr, ret) do { *(ret) = *(ptr); } while (0) @@ -7120,7 +7120,7 @@ static FORCEINLINE void MemoryBarrier(void) #define __WINE_ATOMIC_LOAD_RELAXED(ptr, ret) __atomic_load(ptr, ret, __ATOMIC_RELAXED) #define __WINE_ATOMIC_STORE_RELEASE(ptr, val) __atomic_store(ptr, val, __ATOMIC_RELEASE) #define __WINE_ATOMIC_STORE_RELAXED(ptr, val) __atomic_store(ptr, val, __ATOMIC_RELAXED) -#endif /* defined(__x86_64__) || defined(__i386__) */ +#endif /* (defined(__x86_64__) || defined(__i386__)) && !defined(__clang__) && __GNUC__ < 8 */
static FORCEINLINE LONG ReadAcquire( LONG const volatile *src ) {
Hi, @zfigura can you please have a look? And maybe @iamahuman as well. Thanks!
I'm not sure if this patch is useful.
Is there a bug fixed by this patch? Otherwise, does this patch improve performance or correctness?
The original code applies a workaround to compilers that don't need it, so yeah arguably I think this improves correctness.
Besides that, I am pretty sure load acquire/store release means more than `__asm__ __volatile__( "" ::: "memory" );`, which prevents compiler reordering but not processor reordering. Granted it's not needed on x86, but I think it's also an argument for correctness.
The original code applies a workaround to compilers that don't need it,
Indeed, those compilers don't need it.
On the other hand, the atomic operations with compiler-only barriers, which you describe as a "workaround," *are* correct for both old and new compilers AFAIK.
so yeah arguably I think this improves correctness.
Correctness is a "yes or no" question. If the code is already correct, you cannot further improve its correctness.
Maybe you actually meant future compatibility--whether it will stay correct in the future. This requires that the following assumptions continue to hold:
1. The atomic macros assume that volatile memory access (load / store) operations emitted by the compiler implicitly have both acquire and release semantics on all x86 implementations (microarchitectures) that GCC and clang target. Exceptions are SSE non-temporal memory operations, which are not cache-coherent and thus cannot be used for implementing C-like generic memory operations in the first place.
2. The atomic macros assume that `__asm__ __volatile__("" ::: "memory");` is a compiler memory barrier. It's safe to assume that this will stay true in the future, since the Linux kernel also uses this syntax.
Besides that, I am pretty sure load acquire/store release means more than `__asm__ __volatile__( "" ::: "memory" );`, which prevents compiler reordering but not processor reordering.
A barrier (let it be a compiler-only or hardware instruction) by itself has nothing to do with Load-Acquire or Store-Release. It is the combination of memory accesses and barriers that can implement either semantics of the abstract memory model.
If you believe there's more to acquire/release, would you provide an example?
Granted it's not needed on x86,
Right. On non-x86 architectures, we assume newer GCC.
but I think it's also an argument for correctness.
I'm not sure if this stands as a valid argument for corrrectness.