https://bugs.winehq.org/show_bug.cgi?id=47800
Bug ID: 47800 Summary: Better detect Windows reboots Product: Wine-Testbot Version: unspecified Hardware: x86 OS: Linux Status: NEW Severity: normal Priority: P2 Component: unknown Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
When a test crashes Windows in a VM we get typically get an error like this:
channel 2: open failed: connect failed: Connection refused channel 2: open failed: connect failed: Connection refused channel 2: open failed: connect failed: Connection refused An error occurred while waiting for the test to complete: network read got a premature EOF (wait2/connect:AgentVersion.h:0/9) The test VM has crashed, rebooted or lost connectivity (or the TestAgent server died) The previous 2 run(s) terminated abnormally
Nowadays most VMs autologin and autostart the TestAgent server with the --show-restarts option. What this does is pop up a dialog saying:
TestAgentd.exe was restarted (2). Did Windows reboot?
This is a telltale sign that Windows indeed rebooted (or that the TestBot administrator incorrectly set up the VM but then you'd see that dialog in all screenshots of that VM).
The way this works is that when given the --show-restarts option the TestAgent server increases a persistent counter every time it starts. Normally the TestBot administrator will have reset that counter so it will be set to one such that a reboot will push it to 2 prompting this dialog.
What's interesting is that this counter is accessible from the client with getproperties("start.counter"). So the WineRun*.pl script can know if the VM was indeed rebooted.
But from the log above we also notice that they don't leave enough time for the VM to reboot and thus fail to reestablish the connection to the TestAgent server. This could be solved by adding a call to SetConnectionTimeout(,,$WaitForBoot) in the strategic places.
Then in most cases the WineRun*.pl scripts could conclusively say that Windows crashed and rebooted.
https://bugs.winehq.org/show_bug.cgi?id=47800
--- Comment #1 from François Gouget fgouget@codeweavers.com --- This counter is actually a really annoying when setting up VMs: forget to reset it before taking a snapshot and LibvirtTool will rightly complain that it is set wrong.
Maybe that could be avoided by having LibvirtTool match changes in this counter to the reboots it triggered and reset it when appropriate. Or maybe another mechanism should be used.
One thing that often prevents detecting that the VM rebooted is Windows' default 30 second boot delay which causes the test to time out and give up before it had time to reconnect to the TestAgent server. So it would help if LibvirtTool could automatically reduce this delay to ~2 seconds when setting up "-live" snapshots.
One more thing is that it's possible to get information about the reason for the reboot from the event log. If there was a BSOD one can even get some information about it... if there's at least a small pagefile set up.
Removing "options kvm ignore_msrs=1" from /etc/modprobe.d and running ntdll:exception on a Windows 10 VM is a nice way to produce BSODs to test this.