The original code applies a workaround to compilers that don't need it, so yeah arguably I think this improves correctness.
Besides that, I am pretty sure load acquire/store release means more than `__asm__ __volatile__( "" ::: "memory" );`, which prevents compiler reordering but not processor reordering. Granted it's not needed on x86, but I think it's also an argument for correctness.