[PATCH 1/2] server: Always place compiler barrier before atomic memory stores.
GCC, clang, and other compilers do not actually provide acquire/release semantics on volatile memory accesses. This is also true on MSVC with the /volatile:iso (use strict ISO C semantics for volatile accesses) flag. Consider the following test program: void func(int *foo, volatile int *bar) { *foo = 1; *bar = 2; /* NOTE: *not* immune to reordering! */ *foo = 3; } After store reordering and dead code removal, the function above compiles into the following x86-64 assembly on GCC 11.2 (gcc -O2): movl $2, (%rsi) movl $3, (%rdi) ret Note that the first write to "*foo" has been ellided; the compiler decided that it is safe to reorder writes to "*foo" around writes to the volatile variable "*bar", so it simply merged the first "*foo" write into the second one. Fix this by explicitly specifying a compiler memory barrier before the atomic store. As a workaround for GCC bug #81316, we do this even for other architectures where we explicitly use the atomic memory access builtin. Signed-off-by: Jinoh Kang <jinoh.kang.kr(a)gmail.com> --- server/fd.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/server/fd.c b/server/fd.c index 1b4b98b0e76..5865db89c33 100644 --- a/server/fd.c +++ b/server/fd.c @@ -381,10 +381,15 @@ static const int user_shared_data_timeout = 16; static void atomic_store_ulong(volatile ULONG *ptr, ULONG value) { - /* on x86 there should be total store order guarantees, so volatile is - * enough to ensure the stores aren't reordered by the compiler, and then - * they will always be seen in-order from other CPUs. On other archs, we - * need atomic intrinsics to guarantee that. */ + /* on x86 there should be total store order guarantees, so a compiler + * barrier is enough to ensure the stores aren't reordered by the compiler; + * then, they will always be seen in-order from other CPUs. On other archs, + * we need atomic intrinsics to guarantee that. + * + * even when using atomic intrinsics, we explicitly place a memory barrier + * to work around GCC bug #81316 (affects all GCC versions prior to 7.x). + */ + __asm__ __volatile__("" ::: "memory"); #if defined(__i386__) || defined(__x86_64__) *ptr = value; #else @@ -394,6 +399,10 @@ static void atomic_store_ulong(volatile ULONG *ptr, ULONG value) static void atomic_store_long(volatile LONG *ptr, LONG value) { + /* even when using atomic intrinsics, we explicitly place a memory barrier + * to work around GCC bug #81316 (affects all GCC versions prior to 7.x). + */ + __asm__ __volatile__("" ::: "memory"); #if defined(__i386__) || defined(__x86_64__) *ptr = value; #else -- 2.34.1
We only require stores to not be reordered with other stores, so the release semantics (load+store, store+store) would be enough. Note that sequentially consistent memory accesses usually involve an expensive memory barrier on most architectures. Signed-off-by: Jinoh Kang <jinoh.kang.kr(a)gmail.com> --- server/fd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/server/fd.c b/server/fd.c index 5865db89c33..ad010d4b4ea 100644 --- a/server/fd.c +++ b/server/fd.c @@ -393,7 +393,7 @@ static void atomic_store_ulong(volatile ULONG *ptr, ULONG value) #if defined(__i386__) || defined(__x86_64__) *ptr = value; #else - __atomic_store_n(ptr, value, __ATOMIC_SEQ_CST); + __atomic_store_n(ptr, value, __ATOMIC_RELEASE); #endif } @@ -406,7 +406,7 @@ static void atomic_store_long(volatile LONG *ptr, LONG value) #if defined(__i386__) || defined(__x86_64__) *ptr = value; #else - __atomic_store_n(ptr, value, __ATOMIC_SEQ_CST); + __atomic_store_n(ptr, value, __ATOMIC_RELEASE); #endif } -- 2.34.1
On 2/10/22 18:18, Jinoh Kang wrote:
GCC, clang, and other compilers do not actually provide acquire/release semantics on volatile memory accesses. This is also true on MSVC with the /volatile:iso (use strict ISO C semantics for volatile accesses) flag.
Consider the following test program:
void func(int *foo, volatile int *bar) { *foo = 1; *bar = 2; /* NOTE: *not* immune to reordering! */ *foo = 3; }
After store reordering and dead code removal, the function above compiles into the following x86-64 assembly on GCC 11.2 (gcc -O2):
movl $2, (%rsi) movl $3, (%rdi) ret
Note that the first write to "*foo" has been ellided; the compiler decided that it is safe to reorder writes to "*foo" around writes to the volatile variable "*bar", so it simply merged the first "*foo" write into the second one.
Fix this by explicitly specifying a compiler memory barrier before the atomic store. As a workaround for GCC bug #81316, we do this even for other architectures where we explicitly use the atomic memory access builtin.
FWIW I believe that the only thing that matters for these helpers and where they are used (to write to user shared data section) is that the ordering is guaranteed between the volatile stores, not really with other load or stores on non-volatile data around them. This should be the case. Then maybe they shouldn't be named atomic, because they aren't. Or, if we really want to fix the semantics here, maybe we should just use the __atomic intrinsics everywhere. The concern was the compatibility with older GCC or non-GCC compilers, and I don't think they are much less portable than inline assembly at this point. -- Rémi Bernon <rbernon(a)codeweavers.com>
On Fri, Feb 11, 2022, 2:32 AM Rémi Bernon <rbernon(a)codeweavers.com> wrote:
On 2/10/22 18:18, Jinoh Kang wrote:
GCC, clang, and other compilers do not actually provide acquire/release semantics on volatile memory accesses. This is also true on MSVC with the /volatile:iso (use strict ISO C semantics for volatile accesses) flag.
Consider the following test program:
void func(int *foo, volatile int *bar) { *foo = 1; *bar = 2; /* NOTE: *not* immune to reordering! */ *foo = 3; }
After store reordering and dead code removal, the function above compiles into the following x86-64 assembly on GCC 11.2 (gcc -O2):
movl $2, (%rsi) movl $3, (%rdi) ret
Note that the first write to "*foo" has been ellided; the compiler decided that it is safe to reorder writes to "*foo" around writes to the volatile variable "*bar", so it simply merged the first "*foo" write into the second one.
Fix this by explicitly specifying a compiler memory barrier before the atomic store. As a workaround for GCC bug #81316, we do this even for other architectures where we explicitly use the atomic memory access builtin.
FWIW I believe that the only thing that matters for these helpers and where they are used (to write to user shared data section) is that the ordering is guaranteed between the volatile stores, not really with other load or stores on non-volatile data around them. This should be the case.
Then maybe they shouldn't be named atomic, because they aren't. Or, if we really want to fix the semantics here, maybe we should just use the __atomic intrinsics everywhere. The concern was the compatibility with older GCC or non-GCC compilers, and I don't think they are much less portable than inline assembly at this point.
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
-- Rémi Bernon <rbernon(a)codeweavers.com>
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug? -- Rémi Bernon <rbernon(a)codeweavers.com>
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch. -- Rémi Bernon <rbernon(a)codeweavers.com>
On Fri, Feb 11, 2022, 2:50 AM Rémi Bernon <rbernon(a)codeweavers.com> wrote:
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316 says the bug only affects the __ATOMIC_RELEASE case, but it affects __ATOMIC_SEQ_CST too on ARM64: https://godbolt.org/z/aTaaYaoGK --
Rémi Bernon <rbernon(a)codeweavers.com>
On Fri, Feb 11, 2022, 3:03 AM Jin-oh Kang <jinoh.kang.kr(a)gmail.com> wrote:
On Fri, Feb 11, 2022, 2:50 AM Rémi Bernon <rbernon(a)codeweavers.com> wrote:
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316 says the bug only affects the __ATOMIC_RELEASE case, but it affects __ATOMIC_SEQ_CST too on ARM64: https://godbolt.org/z/aTaaYaoGK
Which is to say, this is still a meaningful fix for non-x86 architectures.
--
Rémi Bernon <rbernon(a)codeweavers.com>
On 2/10/22 19:06, Jin-oh Kang wrote:
On Fri, Feb 11, 2022, 3:03 AM Jin-oh Kang <jinoh.kang.kr(a)gmail.com> wrote:
On Fri, Feb 11, 2022, 2:50 AM Rémi Bernon <rbernon(a)codeweavers.com> wrote:
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316 says the bug only affects the __ATOMIC_RELEASE case, but it affects __ATOMIC_SEQ_CST too on ARM64: https://godbolt.org/z/aTaaYaoGK
Which is to say, this is still a meaningful fix for non-x86 architectures.
Yes, except that I think we still only care about then ordering between the volatile stores, and the bug doesn't happen if the two variables are volatile. -- Rémi Bernon <rbernon(a)codeweavers.com>
Jin-oh Kang <jinoh.kang.kr(a)gmail.com> writes:
On Fri, Feb 11, 2022, 3:03 AM Jin-oh Kang <jinoh.kang.kr(a)gmail.com> wrote:
On Fri, Feb 11, 2022, 2:50 AM Rémi Bernon <rbernon(a)codeweavers.com> wrote:
On 2/10/22 18:43, Rémi Bernon wrote:
On 2/10/22 18:36, Jin-oh Kang wrote:
We still support GCC 4.x (an in-support RHEL/CentOS release uses it), so I think we still need some wrappers around it. That and the GCC bug.
The __sync intrinsices seems to have be there since GCC 4.2, then deprecated when __atomic where introduced in GCC 4.7. What's that bug?
Nvm I missed that part of the first patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316 says the bug only affects the __ATOMIC_RELEASE case, but it affects __ATOMIC_SEQ_CST too on ARM64: https://godbolt.org/z/aTaaYaoGK
Which is to say, this is still a meaningful fix for non-x86 architectures.
We don't support such old gcc on non-x86. In fact we require clang on ARM64 anyway. -- Alexandre Julliard julliard(a)winehq.org
participants (4)
-
Alexandre Julliard -
Jin-oh Kang -
Jinoh Kang -
Rémi Bernon