GCC, clang, and other compilers do not actually provide acquire/release semantics on volatile memory accesses. This is also true on MSVC with the /volatile:iso (use strict ISO C semantics for volatile accesses) flag.
Consider the following test program:
void func(int *foo, volatile int *bar) { *foo = 1; *bar = 2; /* NOTE: *not* immune to reordering! */ *foo = 3; }
After store reordering and dead code removal, the function above compiles into the following x86-64 assembly on GCC 11.2 (gcc -O2):
movl $2, (%rsi) movl $3, (%rdi) ret
Note that the first write to "*foo" has been ellided; the compiler decided that it is safe to reorder writes to "*foo" around writes to the volatile variable "*bar", so it simply merged the first "*foo" write into the second one.
Fix this by explicitly specifying a compiler memory barrier before the atomic store. As a workaround for GCC bug #81316, we do this even for other architectures where we explicitly use the atomic memory access builtin.
Signed-off-by: Jinoh Kang jinoh.kang.kr@gmail.com --- server/fd.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/server/fd.c b/server/fd.c index 1b4b98b0e76..5865db89c33 100644 --- a/server/fd.c +++ b/server/fd.c @@ -381,10 +381,15 @@ static const int user_shared_data_timeout = 16;
static void atomic_store_ulong(volatile ULONG *ptr, ULONG value) { - /* on x86 there should be total store order guarantees, so volatile is - * enough to ensure the stores aren't reordered by the compiler, and then - * they will always be seen in-order from other CPUs. On other archs, we - * need atomic intrinsics to guarantee that. */ + /* on x86 there should be total store order guarantees, so a compiler + * barrier is enough to ensure the stores aren't reordered by the compiler; + * then, they will always be seen in-order from other CPUs. On other archs, + * we need atomic intrinsics to guarantee that. + * + * even when using atomic intrinsics, we explicitly place a memory barrier + * to work around GCC bug #81316 (affects all GCC versions prior to 7.x). + */ + __asm__ __volatile__("" ::: "memory"); #if defined(__i386__) || defined(__x86_64__) *ptr = value; #else @@ -394,6 +399,10 @@ static void atomic_store_ulong(volatile ULONG *ptr, ULONG value)
static void atomic_store_long(volatile LONG *ptr, LONG value) { + /* even when using atomic intrinsics, we explicitly place a memory barrier + * to work around GCC bug #81316 (affects all GCC versions prior to 7.x). + */ + __asm__ __volatile__("" ::: "memory"); #if defined(__i386__) || defined(__x86_64__) *ptr = value; #else
We only require stores to not be reordered with other stores, so the release semantics (load+store, store+store) would be enough. Note that sequentially consistent memory accesses usually involve an expensive memory barrier on most architectures.
Signed-off-by: Jinoh Kang jinoh.kang.kr@gmail.com --- server/fd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/server/fd.c b/server/fd.c index 5865db89c33..ad010d4b4ea 100644 --- a/server/fd.c +++ b/server/fd.c @@ -393,7 +393,7 @@ static void atomic_store_ulong(volatile ULONG *ptr, ULONG value) #if defined(__i386__) || defined(__x86_64__) *ptr = value; #else - __atomic_store_n(ptr, value, __ATOMIC_SEQ_CST); + __atomic_store_n(ptr, value, __ATOMIC_RELEASE); #endif }
@@ -406,7 +406,7 @@ static void atomic_store_long(volatile LONG *ptr, LONG value) #if defined(__i386__) || defined(__x86_64__) *ptr = value; #else - __atomic_store_n(ptr, value, __ATOMIC_SEQ_CST); + __atomic_store_n(ptr, value, __ATOMIC_RELEASE); #endif }
On 2/10/22 18:18, Jinoh Kang wrote:
GCC, clang, and other compilers do not actually provide acquire/release semantics on volatile memory accesses. This is also true on MSVC with the /volatile:iso (use strict ISO C semantics for volatile accesses) flag.
Consider the following test program:
void func(int *foo, volatile int *bar) { *foo = 1; *bar = 2; /* NOTE: *not* immune to reordering! */ *foo = 3; }
After store reordering and dead code removal, the function above compiles into the following x86-64 assembly on GCC 11.2 (gcc -O2):
movl $2, (%rsi) movl $3, (%rdi) ret
Note that the first write to "*foo" has been ellided; the compiler decided that it is safe to reorder writes to "*foo" around writes to the volatile variable "*bar", so it simply merged the first "*foo" write into the second one.
Fix this by explicitly specifying a compiler memory barrier before the atomic store. As a workaround for GCC bug #81316, we do this even for other architectures where we explicitly use the atomic memory access builtin.
FWIW I believe that the only thing that matters for these helpers and where they are used (to write to user shared data section) is that the ordering is guaranteed between the volatile stores, not really with other load or stores on non-volatile data around them. This should be the case.
Then maybe they shouldn't be named atomic, because they aren't. Or, if we really want to fix the semantics here, maybe we should just use the __atomic intrinsics everywhere. The concern was the compatibility with older GCC or non-GCC compilers, and I don't think they are much less portable than inline assembly at this point.
On Fri, Feb 11, 2022, 2:32 AM Rémi Bernon rbernon@codeweavers.com wrote:
On 2/10/22 18:18, Jinoh Kang wrote:
GCC, clang, and other compilers do not actually provide acquire/release semantics on volatile memory accesses. This is also true on MSVC with the /volatile:iso (use strict ISO C semantics for volatile accesses) flag.
Consider the following test program:
void func(int *foo, volatile int *bar) { *foo = 1; *bar = 2; /* NOTE: *not* immune to reordering! */ *foo = 3; }
After store reordering and dead code removal, the function above compiles into the following x86-64 assembly on GCC 11.2 (gcc -O2):
movl $2, (%rsi) movl $3, (%rdi) ret
Note that the first write to "*foo" has been ellided; the compiler decided that it is safe to reorder writes to "*foo" around writes to the volatile variable "*bar", so it simply merged the first "*foo" write into the second one.
Fix this by explicitly specifying a compiler memory barrier before the atomic store. As a workaround for GCC bug #81316, we do this even for other architectures where we explicitly use the atomic memory access builtin.
FWIW I believe that the only thing that matters for these helpers and where they are used (to write to user shared data section) is that the ordering is guaranteed between the volatile stores, not really with other load or stores on non-volatile data around them. This should be the case.
Then maybe they shouldn't be named atomic, because they aren't. Or, if we really want to fix the semantics here, maybe we should just use the __atomic intrinsics everywhere. The concern was the compatibility with older GCC or non-GCC compilers, and I don't think they are much less portable than inline assembly at this point.
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
-- Rémi Bernon rbernon@codeweavers.com
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch.
On Fri, Feb 11, 2022, 2:50 AM Rémi Bernon rbernon@codeweavers.com wrote:
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316 says the bug only affects the __ATOMIC_RELEASE case, but it affects __ATOMIC_SEQ_CST too on ARM64: https://godbolt.org/z/aTaaYaoGK
On Fri, Feb 11, 2022, 3:03 AM Jin-oh Kang jinoh.kang.kr@gmail.com wrote:
On Fri, Feb 11, 2022, 2:50 AM Rémi Bernon rbernon@codeweavers.com wrote:
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316 says the bug only affects the __ATOMIC_RELEASE case, but it affects __ATOMIC_SEQ_CST too on ARM64: https://godbolt.org/z/aTaaYaoGK
Which is to say, this is still a meaningful fix for non-x86 architectures.
--
Rémi Bernon rbernon@codeweavers.com
On 2/10/22 19:06, Jin-oh Kang wrote:
On Fri, Feb 11, 2022, 3:03 AM Jin-oh Kang jinoh.kang.kr@gmail.com wrote:
On Fri, Feb 11, 2022, 2:50 AM Rémi Bernon rbernon@codeweavers.com wrote:
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316 says the bug only affects the __ATOMIC_RELEASE case, but it affects __ATOMIC_SEQ_CST too on ARM64: https://godbolt.org/z/aTaaYaoGK
Which is to say, this is still a meaningful fix for non-x86 architectures.
Yes, except that I think we still only care about then ordering between the volatile stores, and the bug doesn't happen if the two variables are volatile.
Jin-oh Kang jinoh.kang.kr@gmail.com writes:
On Fri, Feb 11, 2022, 3:03 AM Jin-oh Kang jinoh.kang.kr@gmail.com wrote:
On Fri, Feb 11, 2022, 2:50 AM Rémi Bernon rbernon@codeweavers.com wrote:
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316 says the bug only affects the __ATOMIC_RELEASE case, but it affects __ATOMIC_SEQ_CST too on ARM64: https://godbolt.org/z/aTaaYaoGK
Which is to say, this is still a meaningful fix for non-x86 architectures.
We don't support such old gcc on non-x86. In fact we require clang on ARM64 anyway.