So I have a problem with ntdll:exception: it crashes the Windows 10 VM I set up on my development TestBot machine. I'd appreciate any insight you may have.
The VM is Windows 10 Pro 64-bit [1] with all updates all the way to 2020-11-03. And the following exception.c lines cause it to crash:
dreg_test.dr0 = 0x42424240; dreg_test.dr2 = 0x126bb070; dreg_test.dr3 = 0x0badbad0; dreg_test.dr7 = 0xffff0115; run_exception_test(dreg_handler, &dreg_test, &segfault_code, sizeof(segfault_code), 0);
Same thing for the next dreg_handler test a few lines down. None of the TestBot's Windows 10 VMs crash when running that code :-(
More data: * The crash is 100% reproducible.
* I get a BSOD saying KERNEL_SECURITY_CHECK_FAILURE
* Looking in the event log I found the following:
The computer has rebooted from a bugcheck. The bugcheck was: 0x00000139 (0x0000000000000004, 0xffff9f019bff5010, 0xffff9f019bff4f68, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: fb52dedb-eb5e-4010-b0ac-d17284efc806.
* 0x139 does match KERNEL_SECURITY_CHECK_FAILURE. If I understand correctly its type is 0x4, i.e. reserved, which is not very helpful. https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check...
* The online resources are really not helpful. They suggest to: - Check the memory (in a VM? did so anyway, found nothing) - Run chkdsk /f c: (nothing) - Run sfc /scannow (found nothing) - Get the latest drivers. I use the QEmu 0.1.185 drivers (stable branch, same as the TestBot). - Disable the hyper-v network adapter. I use an e1000e network card (like in the official TestBot).
* I use qemu 1:5.0-14~bpo10+1, like vm4 which runs w10pro64.
* I use the same hardware configuration as the TestBot except for the CPU (Haswell-noTSX-IBRS instead of IvyBridge-IBRS) and 4 cores instead of 3.
[1] Installed from Win10_2009_English_x64.19042.iso
On 12/7/20 17:07, Francois Gouget wrote:
So I have a problem with ntdll:exception: it crashes the Windows 10 VM I set up on my development TestBot machine. I'd appreciate any insight you may have.
The VM is Windows 10 Pro 64-bit [1] with all updates all the way to 2020-11-03. And the following exception.c lines cause it to crash:
dreg_test.dr0 = 0x42424240; dreg_test.dr2 = 0x126bb070; dreg_test.dr3 = 0x0badbad0; dreg_test.dr7 = 0xffff0115; run_exception_test(dreg_handler, &dreg_test, &segfault_code, sizeof(segfault_code), 0);
Same thing for the next dreg_handler test a few lines down. None of the TestBot's Windows 10 VMs crash when running that code :-(
More data:
The crash is 100% reproducible.
I get a BSOD saying KERNEL_SECURITY_CHECK_FAILURE
Looking in the event log I found the following:
The computer has rebooted from a bugcheck. The bugcheck was: 0x00000139 (0x0000000000000004, 0xffff9f019bff5010, 0xffff9f019bff4f68, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: fb52dedb-eb5e-4010-b0ac-d17284efc806.
0x139 does match KERNEL_SECURITY_CHECK_FAILURE. If I understand correctly its type is 0x4, i.e. reserved, which is not very helpful. https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check...
The online resources are really not helpful. They suggest to:
- Check the memory (in a VM? did so anyway, found nothing)
- Run chkdsk /f c: (nothing)
- Run sfc /scannow (found nothing)
- Get the latest drivers. I use the QEmu 0.1.185 drivers (stable branch, same as the TestBot).
- Disable the hyper-v network adapter. I use an e1000e network card (like in the official TestBot).
I use qemu 1:5.0-14~bpo10+1, like vm4 which runs w10pro64.
I use the same hardware configuration as the TestBot except for the CPU (Haswell-noTSX-IBRS instead of IvyBridge-IBRS) and 4 cores instead of 3.
[1] Installed from Win10_2009_English_x64.19042.iso
While it looks like a definite VM bug, it is maybe interesting to know for a start what exactly triggers the error. Is it possible to see if disabling the setting of the debug registers in dreg_handler() above (lines context->Dr<n> = test->dr<n>;) avoids the crash, and if yes, which exactly register(s) setting trigger that. That is, disable setting all the registers, then check if setting dr7 alone still crashes it (setting dr7 enables HW breakpoints), and if no, leave dr7 enabled and enable the others one at a time to see which will trigger the crash.
On Mon, 7 Dec 2020, Paul Gofman wrote:
On 12/7/20 17:07, Francois Gouget wrote:
[...]
dreg_test.dr0 = 0x42424240; dreg_test.dr2 = 0x126bb070; dreg_test.dr3 = 0x0badbad0; dreg_test.dr7 = 0xffff0115; run_exception_test(dreg_handler, &dreg_test, &segfault_code, sizeof(segfault_code), 0);
[...]
While it looks like a definite VM bug, it is maybe interesting to know for a start what exactly triggers the error. Is it possible to see if disabling the setting of the debug registers in dreg_handler() above (lines context->Dr<n> = test->dr<n>;) avoids the crash, and if yes, which exactly register(s) setting trigger that.
It does not matter which register is set (dr0, dr2, dr3 or dr7), Windows crashes with any of them. It's only if I comment all of them out that it does not crash :-(
On 12/8/20 17:39, Francois Gouget wrote:
On Mon, 7 Dec 2020, Paul Gofman wrote:
On 12/7/20 17:07, Francois Gouget wrote:
[...]
dreg_test.dr0 = 0x42424240; dreg_test.dr2 = 0x126bb070; dreg_test.dr3 = 0x0badbad0; dreg_test.dr7 = 0xffff0115; run_exception_test(dreg_handler, &dreg_test, &segfault_code, sizeof(segfault_code), 0);
[...]
While it looks like a definite VM bug, it is maybe interesting to know for a start what exactly triggers the error. Is it possible to see if disabling the setting of the debug registers in dreg_handler() above (lines context->Dr<n> = test->dr<n>;) avoids the crash, and if yes, which exactly register(s) setting trigger that.
It does not matter which register is set (dr0, dr2, dr3 or dr7), Windows crashes with any of them. It's only if I comment all of them out that it does not crash :-(
Are you commenting out here in the cited code or (better) in dreg_handler?
On Tue, 8 Dec 2020, Paul Gofman wrote: [...]
Are you commenting out here in the cited code or (better) in dreg_handler?
I did the commenting in the cited code. I tried again in dreg_handler() and that shows I can set Dr6 and Dr1 but none of the others (which makes sense I guess).
Here's what I tested in patch form to avoid ambiguity:
commit f54d113590d1de43ec3ef6ff78369b9dc9d1bdb6 Author: Francois Gouget fgouget@codeweavers.com Date: Tue Dec 8 17:58:39 2020 +0100
HACK ntdll:exception: Comment out most tests to avoid a crash.
* Uncommenting any of the C++ lines causes the KERNEL_SECURITY_CHECK_FAILURE crash. * All if (0) except one just skip irrelevant tests. * The second dreg_handler test produces the same crashes. Ignore it until the first one is fixed / diagnosed.
Signed-off-by: Francois Gouget fgouget@codeweavers.com
diff --git a/dlls/ntdll/tests/exception.c b/dlls/ntdll/tests/exception.c index 5686e39ab9e..cd078d662c1 100644 --- a/dlls/ntdll/tests/exception.c +++ b/dlls/ntdll/tests/exception.c @@ -776,12 +776,12 @@ static DWORD dreg_handler( EXCEPTION_RECORD *rec, EXCEPTION_REGISTRATION_RECORD const struct dbgreg_test *test = *(const struct dbgreg_test **)(frame + 1);
context->Eip += 2; /* Skips the popl (%eax) */ - context->Dr0 = test->dr0; + //context->Dr0 = test->dr0; context->Dr1 = test->dr1; - context->Dr2 = test->dr2; - context->Dr3 = test->dr3; + //context->Dr2 = test->dr2; + //context->Dr3 = test->dr3; context->Dr6 = test->dr6; - context->Dr7 = test->dr7; + //context->Dr7 = test->dr7; return ExceptionContinueExecution; }
@@ -989,13 +989,16 @@ static void test_exceptions(void) run_exception_test(dreg_handler, &dreg_test, &segfault_code, sizeof(segfault_code), 0); check_debug_registers(1, &dreg_test);
+ if (0) { /* causes crashes too but ignore for now */ dreg_test.dr0 = 0x42424242; dreg_test.dr2 = 0x100f0fe7; dreg_test.dr3 = 0x0abebabe; dreg_test.dr7 = 0x115; run_exception_test(dreg_handler, &dreg_test, &segfault_code, sizeof(segfault_code), 0); check_debug_registers(2, &dreg_test); + }
+ if (0) { /* test single stepping behavior */ got_exception = 0; run_exception_test(single_step_handler, NULL, &single_stepcode, sizeof(single_stepcode), 0); @@ -1052,6 +1055,7 @@ static void test_exceptions(void) ctx.Dr7 = 0; res = pNtSetContextThread( GetCurrentThread(), &ctx ); ok( res == STATUS_SUCCESS, "NtSetContextThread failed with %x\n", res ); + } }
static void test_debugger(void) @@ -8189,8 +8193,11 @@ START_TEST(exception)
#ifdef __i386__
+ if (0) { test_unwind(); + } test_exceptions(); + if (0) { test_rtlraiseexception(); test_debug_registers(); test_debug_service(1); @@ -8201,6 +8208,7 @@ START_TEST(exception) test_kiuserexceptiondispatcher(); test_extended_context(); test_copy_context(); + }
#elif defined(__x86_64__)
@@ -8248,6 +8256,7 @@ START_TEST(exception)
#endif
+ if (0) { test_debugger(); test_thread_context(); test_outputdebugstring(1, FALSE); @@ -8264,5 +8273,6 @@ START_TEST(exception) test_suspend_thread(); test_suspend_process(); test_unload_trace(); + } VirtualFree(code_mem, 0, MEM_RELEASE); }
Hi,
While running your changed tests, I think I found new failures. Being a bot and all I'm not very good at pattern recognition, so I might be wrong, but could you please double-check?
Full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=83236
Your paranoid android.
=== w2008s64 (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w7u_2qxl (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w7u_adm (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w7u_el (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w8 (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w8adm (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w864 (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w1064v1507 (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w1064v1809 (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w1064 (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
=== w10pro64 (32 bit report) ===
ntdll: exception.c:801: Test failed: (1) failed to set debug register 0 to 42424240, got 0 exception.c:803: Test failed: (1) failed to set debug register 2 to 126bb070, got 0 exception.c:804: Test failed: (1) failed to set debug register 3 to badbad0, got 0 exception.c:806: Test failed: (1) failed to set debug register 7 to ffff0115, got 0
For future reference here is a summary of some additional tests I did with Paul:
* Setting all Dr* fields to 0 in dreg_handler() except Dr2 still causes a crash. That's a buggy behavior.
* Moving the TestBot's w10pro64 VM to my box and running ntdll:exception also leads to a crash. So that seems to indicate a host issue rather than a guest-side Windows configuration issue.
* All hosts are running Debian 10 but combining my results with the TestBot's we have the following matrix:
kernel 4.19.0-12-amd64 + QEmu 1:3.1+dfsg-8+deb10u8 -> Works (w1064) kernel 5.8.0-0.bpo.2-amd64 + QEmu 1:5.0-14~bpo10+1 -> Works (w10pro64) kernel 4.19.0-11-amd64 + QEmu 1:5.0-14~bpo10+1 -> Crashes (my w10pro64)
So maybe it's a 4.19.0.11 bug or QEmu 5.0 does not like 4.19 kernels. I'll continue the tests when I can reboot.