On Mon, 5 Dec 2016, Jonas Maebe wrote:
On 05/12/16 01:33, Francois Gouget wrote:
On Sun, 4 Dec 2016, Jonas Maebe wrote:
Francois Gouget wrote:
Indeed during the revert vm1 shows no read traffic (lots of RAM to cache that), but a steady 6 MB/s stream of writes ! In contrast on vm2 writes quickly ramp up to 80 MB/s, then stop after ~5 seconds and QEMU just uses CPU for the last ~4 seconds.
Maybe vm2 mounts its file systems with noatime?
The VM hosts all use relatime which provides essentially all the benefits of noatime.
I actually meant the file system in the VM, but I guess there are Windows rather than Linux VMs?
Yes, Windows VMs, Vista for the one I have mentionned before.
Additionally, what is the "revert" operation exactly? Is it like an "svn revert"/"git reset --hard HEAD" in the VM, or some qemu operation, or something else?
virsh --connect qemu:///system snapshot-revert wtbwvista up2014-wtb
This reverts the wtbwvista VM to the up2014-wtb which is a live snapshot.
But in this case I expect the writes all go to the qcow2 disk image and I know vm1 is capable of sustaining more than 6 MB/s writes (e.g. when copying >100 GB around).
One thing you could look at is the output of iostat on the host while the operations are going on, in partical the transactions-per-second, to check whether the issue is that one is using a lot of small writes (for what ever reason) while the other uses fewer, larger writes.
Well, the average read size seems to be the same, 32KB, but the number of transactions sure is different.
vm1 $ iostat -d -h /dev/sda 1 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 2.00 0.00 16.00 0 16 3.00 0.00 52.00 0 52 83.00 0.00 3268.00 0 3268 208.00 0.00 6656.00 0 6656 204.00 0.00 6528.00 0 6528 193.00 0.00 6208.00 0 6208 198.00 0.00 6344.00 0 6344 208.00 0.00 6656.00 0 6656 200.00 0.00 6400.00 0 6400 192.00 0.00 6144.00 0 6144 210.00 0.00 6720.00 0 6720 201.00 0.00 6416.00 0 6416 199.00 0.00 6028.00 0 6028 203.00 0.00 6528.00 0 6528 200.00 0.00 6400.00 0 6400 194.00 0.00 6208.00 0 6208 201.00 0.00 6444.00 0 6444 207.00 0.00 6592.00 0 6592 202.00 0.00 6464.00 0 6464 209.00 0.00 6656.00 0 6656 208.00 0.00 6656.00 0 6656 194.00 0.00 6232.00 0 6232 202.00 0.00 6404.00 0 6404 191.00 0.00 6988.00 0 6988 203.00 0.00 6528.00 0 6528 200.00 0.00 6400.00 0 6400 205.00 0.00 6588.00 0 6588 211.00 0.00 6720.00 0 6720 202.00 0.00 6464.00 0 6464 196.00 0.00 6272.00 0 6272 203.00 0.00 6464.00 0 6464 197.00 0.00 6284.00 0 6284 190.00 0.00 5860.00 0 5860 200.00 0.00 6400.00 0 6400 204.00 0.00 6528.00 0 6528 204.00 0.00 6528.00 0 6528 196.00 0.00 6232.00 0 6232 194.00 0.00 6212.00 0 6212 200.00 0.00 6400.00 0 6400 202.00 0.00 6464.00 0 6464 196.00 0.00 6272.00 0 6272 192.00 0.00 6144.00 0 6144 192.00 0.00 6092.00 0 6092 190.00 0.00 6080.00 0 6080 192.00 0.00 6144.00 0 6144 201.00 0.00 6400.00 0 6400 191.00 0.00 6144.00 0 6144 182.00 4.00 5284.00 4 5284 3.00 0.00 0.00 0 0 6.00 0.00 72.00 0 72
$ filefrag /var/lib/libvirt/images/wtbwvista.qcow2 /var/lib/libvirt/images/wtbwvista.qcow2: 284 extents found $ ls -lh /var/lib/libvirt/images/wtbwvista.qcow2 -rw-r--r-- 1 libvirt-qemu libvirt-qemu 31G Dec 7 01:48 /var/lib/libvirt/images/wtbwvista.qcow2
vm2 $ iostat -d -h /dev/sda 1 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 2514.00 0.00 81220.00 0 81220 2856.00 64.00 91048.00 64 91048 2368.00 88.00 75592.00 88 75592 1563.00 384.00 50128.00 384 50128 53.00 60.00 1844.00 60 1844 0.00 0.00 0.00 0 0 424.00 616.00 12008.00 616 12008 392.00 2652.00 11364.00 2652 11364 495.00 972.00 15872.00 972 15872 425.00 360.00 14016.00 360 14016
$ filefrag /var/lib/libvirt/images/wtbwvista.qcow2 /var/lib/libvirt/images/wtbwvista.qcow2: 79 extents found $ ls -lh /var/lib/libvirt/images/wtbwvista.qcow2 -rw-r--r-- 1 root root 31G Dec 7 01:43 /var/lib/libvirt/images/wtbwvista.qcow2
On a spinning disk 2500+ IO/s only makes sense if they are contiguous. 200 IO/s however makes sense for random IO. But eve on vm1 the disk image file is not that fragmented. And given that it was restored from the same backup on both machines at a couple of days interval I see no reason for one to cause random IO and not the other.