https://bugs.winehq.org/show_bug.cgi?id=39441
Bug ID: 39441 Summary: The reverts keep getting slower Product: Wine-Testbot Version: unspecified Hardware: x86 OS: Linux Status: NEW Severity: normal Priority: P2 Component: unknown Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
A revert that would take under 10 seconds right after creating the snapshot would take over 7 minutes 6 months later. So every few months this would cause the TestBot to become really sluggish and barely able to keep up with the patch influx. Restarting libvirt, rebooting the host, restoring the VM from backup or even transferring it to another host had no effect on the revert time.
While the revert is taking place the QEmu process fully occupies one core, no disk I/O is performed and the VM is not running. The exact reason is not yet known exactly but it seems to have to do with the VM's timer devices, particularly the rtc one.
To confuse matters further not all VMs are affected: only Windows 2000, XP, 2003, 2008 and 10 suffer from this. The other post Windows Vista are immune. Yet the guest is not active while the revert is taking place so it should not have an impact on it.
Finally at WineConf 2015 it was discovered that the revert time of a live snapshot is simply proportional to the snapshot's age.
This yielded a first workaround which is to refresh the live snapshots regularly.
Further investigation showed that the common point between the impacted live snapshots is that they all have the following clock settings <clock offset='localtime'> <timer name='rtc' tickpolicy='delay'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock>
while the unaffected VMs have <clock offset='localtime'/>
But switching from the former to the latter does not fix the affected VMs.
Still this lead to a better fix which has now been put in place: setting track='guest' on the rtc timer.
Regardless, something is wrong with the way QEmu handles timers and live snapshots so a bug was reported: https://bugs.launchpad.net/qemu/+bug/1505041
Maybe this will shed some light on what's really happening and what the correct timer settings are.