On Mon Apr 15 19:53:36 2024 +0000, Paul Gofman wrote:
changed this line in [version 3 of the diff](/wine/wine/-/merge_requests/5480/diffs?diff_id=109686&start_sha=77ae981d561e447737ba582fd33d4818f5d10493#1104a825a741495a2a3c3093b4d85d95e4f8180f_751_743)
So I've got some details about the failures I am getting here. The test looks like: ``` push %gs xor %eax,%eax mov %ax,%gs mov %gs:(0),%ax; <- here is the expected exception happens pop %gs; ret */ ``` On the current kernel 6.8.5 that I have (and probably for the long time ago at least; that's allo the same with nofsgsbase kernel command line) %gs selector is always 0. However, setting it %0 resets gsbase (that can be traced in signal handler). Then, while delivering exception to wow64 32 bit handler, fault happens from wow64cpu.BTCpuResetToConsistentState() in BTCpuSetContext() which ends up trying to make a syscall. Which faults in the first instruction in wine_syscall_dispatcher ("movq %gs:0x328,%rcx\n\t") because gsbase is now 0, which triggers new exception setup and that repeats until stack overflow seen in default test output. Restoring gsbase with arch_prctl( ARCH_SET_GS, NtCurrentTeb() ); in setup_raise_exception fixes this specific test execution. Although "pop %gs" resets gsbase to 0 again and faults again in the first wow64 syscall which follows and triggers winedbg.
The following debug diff fixes all the problematic tests without disabling them: ``` diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 4fc2727595d..b3a4b3a72a0 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -1932,11 +1932,21 @@ static BOOL handle_syscall_trap( ucontext_t *sigcontext ) * * Handler for SIGSEGV and related errors. */ -static void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) +void segv_handler( int signal, siginfo_t *siginfo, void *sigcontext ) { EXCEPTION_RECORD rec = { 0 }; struct xcontext context; ucontext_t *ucontext = init_handler( sigcontext ); + void *gsbase, *teb; + + arch_prctl( ARCH_GET_GS, &gsbase ); + teb = NtCurrentTeb(); + if (teb != gsbase) + { + arch_prctl( ARCH_SET_GS, teb ); + ERR_(seh)("reset gsbase, teb %p.\n", teb); + return; + }
rec.ExceptionAddress = (void *)RIP_sig(ucontext); save_context( &context, ucontext ); ```
I also made a test program (targeted for 64 bit) tormenting %gs selector which runs without crashes on Windows and under Wine with the diff above: ``` #include <stdio.h> #include <windows.h>
void *g[256]; int line[256]; int count;
#define RECORD_GSBASE() {__asm__ volatile("rdgsbase %%rax\n\t mov %%rax,%0" : "=m"(gsbase)); g[count] = gsbase; line[count++] = __LINE__;}
NTSTATUS WINAPI NtYieldExecution(void); NTSTATUS WINAPI NtClose(HANDLE); NTSTATUS WINAPI NtDelayExecution(BOOLEAN alertable, const LARGE_INTEGER *timeout);
int main(int argc, char *argv[]) { LARGE_INTEGER timeout; void *gsbase; int i; void *save_gsbase; unsigned int gs, gs1; NTSTATUS status; void *teb;
__asm__ volatile("mov %%gs,%%eax\n\tmov %%eax, %0" : "=m"(gs)); printf("gs %#x.\n", gs);
RECORD_GSBASE() save_gsbase = gsbase; __asm__ volatile("mov $0x7eeffeedcafe,%rax\n\twrgsbase %rax");
RECORD_GSBASE() __asm__ volatile("mov %%gs:0x30,%%rax\n\tmov %%rax,%0" : "=m"(teb)); RECORD_GSBASE()
__asm__ volatile("xor %eax,%eax\n\tmov %ax,%gs"); RECORD_GSBASE()
//printf("\n"); NtYieldExecution(); RECORD_GSBASE()
__asm__ volatile("mov %%gs,%%eax\n\tmov %%eax, %0" : "=m"(gs1));
timeout.QuadPart = -1000000; status = NtDelayExecution(0, &timeout); RECORD_GSBASE()
__asm__ volatile("mov %0,%%rax\n\twrgsbase %%rax" :: "m"(save_gsbase)); printf("teb %p.\n", teb);
for (i = 0; i < count; ++i) printf("gsbase %p at line %d.\n", g[i], line[i]);
__asm__ volatile("mov %%gs,%%eax\n\tmov %%eax, %0" : "=m"(gs)); printf("gs1 %#x.\n", gs1); printf("gs %#x.\n", gs);
return 0; } ```
On Windows %gs value is not 0, but otherwise it seems like it behaves somewhat between the lines of my debug diff to Wine above. %In particular, mind the test "__asm__ volatile("mov %%gs:0x30,%%rax\n\tmov %%rax,%0" : "=m"(teb));" where it successfully gets the teb after setting gsbase to 0x7eeffeedcafe, and gsbase is back to original value after that. So it looks like Windows handles such faults transparently and restores gs base back. Setting %gs value clears gsbase like on Linux. NtYieldExecution() doesn't restore gsbase, and so doesn't NtDelayExecution with a smaller timeout, so I am guessing on Windows the restoration is more likely related to context switching and fault handling than syscalls themselves.
Now, I am not quite sure how the tests under discussion can succeed anywhere on Linux. I tried disabling rd/wrgsbase with nofsgsbase kernel command line option but this makes no difference. I tried looking at kernel code and history a little bit but didn't immediately spot anything which could be responsible for the difference. Is that success under some VM maybe? I can imagine the supervisor doing something with those selector changes / gsbase. Which kernel version that succeed under?
In the up to date kernel and not under VM I am so far failing to see how that can work, gsbase is cleared and there is seemingly nothing to restore it.
UPDATE I am not sure if that possible, but also maybe resetting gsbase on selector reset to the same value can have some differences between CPUs? Here I tried on one Intel and one AMD cpu reproducing the test failures on both.