New subject: [PATCH 2/2] server: Relax memory order constraints in atomic_store_* functions.

10 Feb 2022

GCC, clang, and other compilers do not actually provide acquire/release
semantics on volatile memory accesses.  This is also true on MSVC with
the /volatile:iso (use strict ISO C semantics for volatile accesses)
flag.
Consider the following test program:
void func(int *foo, volatile int *bar)
    {
        *foo = 1;
        *bar = 2;  /* NOTE: *not* immune to reordering! */
        *foo = 3;
    }
After store reordering and dead code removal, the function above
compiles into the following x86-64 assembly on GCC 11.2 (gcc -O2):
movl $2, (%rsi)
    movl $3, (%rdi)
    ret
Note that the first write to "*foo" has been ellided; the compiler
decided that it is safe to reorder writes to "*foo" around writes to the
volatile variable "*bar", so it simply merged the first "*foo" write
into the second one.
Fix this by explicitly specifying a compiler memory barrier before the
atomic store.  As a workaround for GCC bug #81316, we do this even for
other architectures where we explicitly use the atomic memory access
builtin.
Signed-off-by: Jinoh Kang jinoh.kang.kr@gmail.com
---
 server/fd.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/server/fd.c b/server/fd.c
index 1b4b98b0e76..5865db89c33 100644
--- a/server/fd.c
+++ b/server/fd.c
@@ -381,10 +381,15 @@ static const int user_shared_data_timeout = 16;
static void atomic_store_ulong(volatile ULONG *ptr, ULONG value)
 {
-    /* on x86 there should be total store order guarantees, so volatile is
-     * enough to ensure the stores aren't reordered by the compiler, and then
-     * they will always be seen in-order from other CPUs. On other archs, we
-     * need atomic intrinsics to guarantee that. */
+    /* on x86 there should be total store order guarantees, so a compiler
+     * barrier is enough to ensure the stores aren't reordered by the compiler;
+     * then, they will always be seen in-order from other CPUs. On other archs,
+     * we need atomic intrinsics to guarantee that.
+     *
+     * even when using atomic intrinsics, we explicitly place a memory barrier
+     * to work around GCC bug #81316 (affects all GCC versions prior to 7.x).
+     */
+    __asm__ __volatile__("" ::: "memory");
 #if defined(__i386__) || defined(__x86_64__)
     *ptr = value;
 #else
@@ -394,6 +399,10 @@ static void atomic_store_ulong(volatile ULONG *ptr, ULONG value)
static void atomic_store_long(volatile LONG *ptr, LONG value)
 {
+    /* even when using atomic intrinsics, we explicitly place a memory barrier
+     * to work around GCC bug #81316 (affects all GCC versions prior to 7.x).
+     */
+    __asm__ __volatile__("" ::: "memory");
 #if defined(__i386__) || defined(__x86_64__)
     *ptr = value;
 #else
-- 
2.34.1



    

[PATCH 1/2] server: Always place compiler barrier before atomic memory stores.