It feels like we're shuffling around gs accesses a bit too much, spilling registers into stack in the process.
Instead, I think we can just use gs once for `movq %gs:0x328,%rcx`, and use `struct syscall_frame.gsbase (0xb8)` for subsequent TEB accesses[^1]. Let's rename the `gsbase` field to `teb` to better reflect its nature.
[^1]: My empirical experience tells me that stack memory access is much faster than TLS access (e.g., %gs:OFFSET) anyway. Correct me if I'm wrong, though.