On Sun, 21 Apr 2019, Francois Gouget wrote:
Here are some things I've learned about PCI-passthrough recently, which would be one way (probably the best) to add "real hardware" to the TestBot.
I have finally done some tests with PCI-passthrough on my box and got it working with a Windows 10 VM outside of the TestBot.
Hardware: i7-4790K + Asus Z97-A + AMD RX 550 Software: Debian 10 + kernel 5.5.0-0.bpo.2-amd64 + QEmu 5.0-14~bpo10+1
Things I learned:
* Everyone recommends using OVMF, QEMu's UEFI BIOS. Currently that's totally useless for the TestBot: not only is it impossible to take live snapshots with OVMF, you cannot even take snapshots of the powered off VM!!! And that's even before PCI-passthrough enters the picture. Such a VM would be no better than running the tests on the bare metal in terms of getting back to a clean state.
* But getting PCI-passthrough going with OVMF was indeed easier.
* It's possible to combine the QXL+Spice screen and PCI-passthrough for a dual-GPU VM configuration. The benefit is you at least get the QXL screen and then can work out the kinks for the extra GPU.
* In this configuration you can also use the host's keyboard and mouse although things will be wonky as soon as you extend your screen with the second GPU. The reason is that you normally have to exit your main screen by the side to get on the second one. But in this configuration that just gets your mouse out of the Spice window. So what Spice/QEmu seems to be doing is matching the left (resp. right) edge of your Spice window to the left (right) edge of your leftmost (rightmost) screen. So the mouse switches to the second screen somewhere in the middle which means all clicks are offset from the mouse pointer. Yuck! So look for hovering highlights and learn to use keyboard shortcuts.
* Once you remove the QXL+Spice screen the VM will be headless because it does not know there is still a VGA device (that's the part OVMF handles better). The fix is to manually edit the VM's XML file to pass x-vga=on on the right device:
-<domain type='kvm'> +<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> ... +qemu:commandline + <qemu:arg value='-set'/> + <qemu:arg value='device.hostdev0.x-vga=on'/> +</qemu:commandline>
Where 'hostdev0' is the GPU (and hostdev1 is the matching audio card).
* With that configuration it's possible to take snapshots of the powered off VM. So that's something the TestBot can work with... eventually.
* You can also remove the ich9 audio device and use the graphics card one instead. There's even an option (somewhere) to make it work through a DVI-to-HDMI cable (my other cables at hand were too short).
* I also did tests with a Debian 10 VM but those attempts were thrown off by: - The x-vga issue above. - My old screen which simply does not work with the AMD graphics card (and one other laptop out of two, except when going through a VGA adapter). - Probably some GPU driver setup issues (initially I was missing firmware-linux-nonfree). So I'll have to retry.
* During my Debian 10 tests I ended up crashing the RX 550 such that I got a QEmu error on the host side and could not restart that VM until I rebooted the host. Ouch! I really don't want to have to reboot the host after each test. Also I thought AMD's reset issues only concerned the Southern Islands (HD 7000) and Sea Island (Rx 200) GPUs. But then I did not run into this issue again when testing the Windows 10 guest. So maybe there's still hope.
So that's the state of things for now. I don't know when I'll get back to this but the next steps will likely be:
1. Setting up a proper Windows VM with PCI-passthrough and add it to the TestBot.
2. Add it to my local TestBot instance and see how stable that is.
3. Give the Debian 10 VM another try.
4. Maybe set up a Windows PCI-passthrough VM on the official TestBot. vm3 and vm4 look like they should support it (E3-1226 v3 + C224 PCH). However I don't know if the chassis has room to put a graphics card.