On Mon Mar 20 13:34:56 2023 +0000, Jinoh Kang wrote:
Wouldn't `volatile` be sufficient in this case?
Indeed, it looks like `volatile` is more suitable since it's friendlier to GCC's `__builtin_ia32_wrgsbase64` intrinsic, which is defined in [`gcc/config/i386/i386.md`] (GCC 12.2) as having an [`unspec_volatile`] side-effect but not a memory clobber. A short test at https://godbolt.org/z/G5fWPTjr5 seems to confirm this. [`gcc/config/i386/i386.md`]: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/i386/i386.md;h=be07be... [`unspec_volatile`]: https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gccint/Side-Effects.html#index-uns...
**EDIT**: IMHO Adding a reference to a global variable as a memory
input operand should solve the mentioned problem without preventing optimization too much.
static FORCEINLINE struct _TEB * WINAPI NtCurrentTeb(void) { extern int dummy; struct _TEB *teb; __asm__(".byte 0x65\n\tmovq (0x30),%0" : "=r" (teb) : "m" (dummy)); return teb; }
GCC won't optimize away the asm invocations if there is a function
call that could potentially modify the `dummy` variable (any function GCC can't prove it doesn't modify the `dummy` variable) between the asm invocations. I'm not sure if this works reliably with whole-program link-time optimization.
I don't see how can `NtCurrentTeb()` change between calls within the same function ("function" here means what is resulted after inlining)? Do you have a (hypothetical) example of such code?