Wouldn't `volatile` be sufficient in this case?
Indeed, it looks like `volatile` is more suitable since it's friendlier to GCC's `__builtin_ia32_wrgsbase64` intrinsic, which is defined in [`gcc/config/i386/i386.md`] (GCC 12.2) as having an [`unspec_volatile`] side-effect but not a memory clobber. A short test at https://godbolt.org/z/G5fWPTjr5 seems to confirm this.
[`gcc/config/i386/i386.md`]: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/i386/i386.md;h=be07be... [`unspec_volatile`]: https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gccint/Side-Effects.html#index-uns...
**EDIT**: IMHO Adding a reference to a global variable as a memory input operand should solve the mentioned problem without preventing optimization too much.
static FORCEINLINE struct _TEB * WINAPI NtCurrentTeb(void) { extern int dummy; struct _TEB *teb; __asm__(".byte 0x65\n\tmovq (0x30),%0" : "=r" (teb) : "m" (dummy)); return teb; }
GCC won't optimize away the asm invocations if there is a function call that could potentially modify the `dummy` variable (any function GCC can't prove it doesn't modify the `dummy` variable) between the asm invocations.
I'm not sure if this works reliably with whole-program link-time optimization.