VM::Run() now checks that no process is actively using the VM before starting the new one. Killing the current process makes more sense anyway and could help ensure it won't interfere with the new one by sending TestAgent commands.
Signed-off-by: Francois Gouget fgouget@codeweavers.com ---
tl;dr; The TestBot Engine attempted a sacrifice but it backfired and killed it instead.
More details: * The Engine started reverting the w1064v1809 VM. * When that VM transitionned from reverting to sleeping the Engine realized that it really wanted to run a task on another VM on the same host instead. * Since the host is set up to only run one VM at a time the TestBot decided to power off w1064v1809 right away so it could prepare this other VM. * But LibvirtTool was still working on w1064v1809, i.e. it was still registered in VM::ChildPid, causing VM::Run() to refuse to start the poweroff process, as per the new check (c0c486a133f1). * So VM::Run() returned an error, which should not have been a big deal. But for some still unknown reason the Engine then died: I checked and the process is not running anymore despite there being no error (perl or otherwise) in the log.
So restarting the Engine should solve the immediate issue, until this race happens again (it took ~9 days for this race to happen so we have some time to figure it out).
Applying this patch should prevent the issue from reoccurring.
But I will still try to reproduce the issue here so I can figure out why the Engine died.
testbot/lib/WineTestBot/Engine/Scheduler.pm | 1 + 1 file changed, 1 insertion(+)
diff --git a/testbot/lib/WineTestBot/Engine/Scheduler.pm b/testbot/lib/WineTestBot/Engine/Scheduler.pm index 2131c6833..be1a1fbc8 100644 --- a/testbot/lib/WineTestBot/Engine/Scheduler.pm +++ b/testbot/lib/WineTestBot/Engine/Scheduler.pm @@ -776,6 +776,7 @@ sub _SacrificeVM($$$) $Host->{$Victim->Status}--; $Host->{dirty}++; $Victim->RecordStatus($Sched->{records}, $Victim->Status eq "dirty" ? "dirty poweroff" : "dirty sacrifice"); + $Victim->KillChild(); $Victim->RunPowerOff(); return 1; }