https://bugs.winehq.org/show_bug.cgi?id=51152
Bug ID: 51152 Summary: The 64-bit ntdll:exception test fails in Wine Product: Wine Version: 6.8 Hardware: x86-64 OS: Linux Status: NEW Severity: normal Priority: P2 Component: ntdll Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
The 64-bit ntdll:exception test fails in Wine.
More precisely the test exits without printing the summary line which means it must be calling exit() or prematurely being forced to exit in some other way:
https://test.winehq.org/data/patterns.html#ntdll:exception
exception.c:3027: Test marked todo: 35: ds 0 does not match ss 0x2b exception.c:3030: Test marked todo: 35: got fs 0 <---- trace from the test ntdll:exception:033c done (0) in 0s <---- trace from WineTest
This started with the following commit:
commit 10d7a804c1973f332b9068cb8c98119c6dd7c1e2 Author: Zebediah Figura z.figura12@gmail.com Date: Sun Mar 28 17:08:30 2021 -0500
ntdll/tests: Add a test for segment register contents in x86_64 exception handlers.
Signed-off-by: Zebediah Figura z.figura12@gmail.com Signed-off-by: Alexandre Julliard julliard@winehq.org
https://bugs.winehq.org/show_bug.cgi?id=51152
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Regression SHA1| |10d7a804c1973f332b9068cb8c9 | |8119c6dd7c1e2 Keywords| |source, testcase
--- Comment #1 from François Gouget fgouget@codeweavers.com --- What's missing is this line:
012c:exception: 7 tests executed (0 marked as todo, 0 failures), 0 skipped.
https://bugs.winehq.org/show_bug.cgi?id=51152
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #2 from Zebediah Figura z.figura12@gmail.com --- Created attachment 70184 --> https://bugs.winehq.org/attachment.cgi?id=70184 attempt at minimal reproducer
Further information: https://www.winehq.org/pipermail/wine-devel/2021-May/186007.html
Especially weird thing about it is, fs and gs are both 0 to begin with, so it's not even like we're changing the value.
But I'm not sure how to debug this. What's the host kernel version? Does the attached program, compiled as a 64-bit ELF binary, crash? Does it crash when run outside of the VM?
https://bugs.winehq.org/show_bug.cgi?id=51152
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |regression
https://bugs.winehq.org/show_bug.cgi?id=51152
Paul Gofman pgofman@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |pgofman@codeweavers.com
--- Comment #3 from Paul Gofman pgofman@codeweavers.com --- I was reproducing it here and even did a bit of debug lately. The issue is that currently on Linux (I am running it on 5.12) any write to fs, gs resets the fsbase / gsbase. Even if the same value is set. The registers are zero in the syscall setting the base value.
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #4 from François Gouget fgouget@codeweavers.com --- Here are the results of my tests in case they are still useful. It does crash:
$ gcc -o exception_test exception_test.c $ ./exception_test cs 0x33 ds 0 es 0 fs 0 gs 0 ss 0x2b Segmentation fault $ cat /proc/version Linux version 5.10.0-6-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.28-1 (2021-04-09)
The same happens on my box (so it's not QEmu's fault): $ cat /proc/version Linux version 4.19.0-16-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.181-1 (2021-03-19)
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #5 from Paul Gofman pgofman@codeweavers.com --- (In reply to François Gouget from comment #4)
The same happens on my box (so it's not QEmu's fault):
Yes, I am reproducing that on real hardware as well.
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #6 from Alexandre Julliard julliard@winehq.org --- (In reply to Paul Gofman from comment #5)
(In reply to François Gouget from comment #4)
The same happens on my box (so it's not QEmu's fault):
Yes, I am reproducing that on real hardware as well.
What CPU?
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #7 from Paul Gofman pgofman@codeweavers.com --- (In reply to Alexandre Julliard from comment #6)
(In reply to Paul Gofman from comment #5)
(In reply to François Gouget from comment #4)
The same happens on my box (so it's not QEmu's fault):
Yes, I am reproducing that on real hardware as well.
What CPU?
AMD Ryzen 5 3500X, Intel i9-8950HK
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #8 from Zebediah Figura z.figura12@gmail.com --- I can't reproduce this with any of the following processors:
AMD FX-8350, Linux 5.12.10 (local build)
AMD Ryzen 3 3200U, Linux 5.12.1-arch1-1
AMD A9-9425 RADEON R5, Linux 5.12.9-arch1-1
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #9 from Paul Gofman pgofman@codeweavers.com --- (In reply to Zebediah Figura from comment #8)
I can't reproduce this with any of the following processors:
AMD FX-8350, Linux 5.12.10 (local build)
AMD Ryzen 3 3200U, Linux 5.12.1-arch1-1
AMD A9-9425 RADEON R5, Linux 5.12.9-arch1-1
Maybe distro specific kernel flag or compile time config option, but I don't immediately know which one. I am running this on Fedora 34.
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #10 from Zebediah Figura z.figura12@gmail.com --- I'm not immediately sure I understand how the kernel can be responsible for this. The AMD spec (volume 4 § 4.5.3) says "When a null selector is loaded into FS or GS, the contents of the corresponding hidden descriptor register are not altered." Is there a bug in some processors?
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #11 from Zebediah Figura z.figura12@gmail.com --- Created attachment 70185 --> https://bugs.winehq.org/attachment.cgi?id=70185 test program + rdfsbase
What does the attached program print on a "bad" system? On a "good" system I get:
hazel@watership:~$ ./test cs 0x33 ds 0 es 0 fs 0 gs 0 ss 0x2b fsbase 0x7fcaf231a580 gsbase 0 fs 0 gs 0 fsbase 0x7fcaf231a580 gsbase 0 fs 0 gs 0
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #12 from Paul Gofman pgofman@codeweavers.com --- (In reply to Zebediah Figura from comment #10)
I'm not immediately sure I understand how the kernel can be responsible for this. The AMD spec (volume 4 § 4.5.3) says "When a null selector is loaded into FS or GS, the contents of the corresponding hidden descriptor register are not altered."
Yes, I read that as well but that's not what I see. I can imagine some ways how kernel can affect that. The only way I see to find that out is to dig the kernel code if there is anything about that.
Why I even think of some kernel-controlled quirk rather than a CPU bug is I don't immediately see how Windows would handle such thing nicely, while the test succeeds on Windows on the same AMD machine here.
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #13 from Paul Gofman pgofman@codeweavers.com --- (In reply to Zebediah Figura from comment #11)
Created attachment 70185 [details] test program + rdfsbase
What does the attached program print on a "bad" system?
cs 0x33 ds 0 es 0 fs 0 gs 0 ss 0x2b fsbase 0x7f19ffc63580 gsbase 0 Segmentation fault (core dumped)
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #14 from Zebediah Figura z.figura12@gmail.com --- (In reply to Paul Gofman from comment #12)
Yes, I read that as well but that's not what I see. I can imagine some ways how kernel can affect that. The only way I see to find that out is to dig the kernel code if there is anything about that.
Why I even think of some kernel-controlled quirk rather than a CPU bug is I don't immediately see how Windows would handle such thing nicely, while the test succeeds on Windows on the same AMD machine here.
I don't really see how the kernel can do anything, unless the processor is violating its own spec by either (a) clearing fsbase anyway or (b) generating a fault.
(In reply to Paul Gofman from comment #13)
(In reply to Zebediah Figura from comment #11)
Created attachment 70185 [details] test program + rdfsbase
What does the attached program print on a "bad" system?
cs 0x33 ds 0 es 0 fs 0 gs 0 ss 0x2b fsbase 0x7f19ffc63580 gsbase 0 Segmentation fault (core dumped)
Hmm, maybe you could try restoring the old value of fsbase via wrfsbase, before calling printf?
I'm trying to see if fsbase is actually getting cleared here. Though maybe I should take your comment 3 as an indication that you can already confirm that.
https://bugs.winehq.org/show_bug.cgi?id=51152
--- Comment #15 from François Gouget fgouget@codeweavers.com --- CPU here: i7-4790K
The result is somewhat different from what Paul got: $ gcc -o exception_test2 exception_test2.c cs 0x33 ds 0 es 0 fs 0 gs 0 ss 0x2b Instruction non permise
("Instruction not allowed", in French even if I unset $LANG)
https://bugs.winehq.org/show_bug.cgi?id=51152
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed by SHA1| |4e4847dd71a3c682356559a5170 | |5ccec93b2490e Resolution|--- |FIXED Status|NEW |RESOLVED
--- Comment #16 from Zebediah Figura z.figura12@gmail.com --- The offending part of the test was disabled by https://source.winehq.org/git/wine.git/commitdiff/4e4847dd71a3c682356559a51705ccec93b2490e.
https://bugs.winehq.org/show_bug.cgi?id=51152
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #17 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 6.13.