Here are some things I've learned about PCI-passthrough recently, which would be one way (probably the best) to add "real hardware" to the TestBot.
I don't want to give anyone false hopes though: this just went from "this is a mysterious thing I need to learn about" to "I think I know how to do it but have not tried it yet".
So graphics card PCI-passthrough is now relatively well documented on the Internet and seems to have seen some use-cases that would indicate it may even be reasonably usable.
* There are two machines intended to run real GPU tests for Wine: cw1-hd6800 and cw2-gtx560. For now they are only used to run WineTest daily on Windows 8.1, Windows 10 1507, 1709, 1809 and Linux. That's quite a bunch but it would be much better if they were integrated with the TestBot as that would allow developers to submit their own tests. So I had a look at what it would imply to convert them to VM hosts using QEmu + PCI-passthrough.
* First one needs a processor with hardware virtualisation support. For Intel that's VT-d. Both machines have an Intel Core 2600 which supports VT-d. Good.
* Second the motherboard too needs to support VT-d. Both machines have an ASRock P67 Extreme4 motherboard. Unfortunately UEFI says "unsupported" next to the "VT-d" setting for the motherboard :-( It looks like there was some confusion as to whether the P67 chipset supported VT-d initially. From what I gathered it's only Q67 that does but this caused some manufacturers, among which ASRock, to initially claim support and later retract it.
* Then one needs to add the intel_iommu=on option to the kernel command line (resp. amd_iommu). This is should make all the PCI devices appear in /sys/kernel/iommu_groups. But that folder remains empty which confirms that full VT-d support is missing.
* Another important aspect is to have a graphics card which is hot-restartable. In some cases when a VM's graphics card is crashed the only way to reset it is to reboot the host. The TestBot is likely to crash the graphics card, particularly if we do a hard-power off on the VMs like we currently do, and it would relaly be annoying to have to reboot the host everytime the graphics card goes belly up. I don't know if the AMD HD6800 and Nvidia GTX560 are suitable but it's quite possible they are not. All I know for now is that we should avoid AMD's R9 line of graphics cards. I still need to find a couple of suitable reasonably lower power graphics cards: one AMD and one Nvidia.
* Then one needs to prevent the host from using the graphics card. Usually that's done by having the host use the processor's IGP and dedicating the discrete GPU to the VMs. Unfortunately the 2600's IGP cannot be active when there's a discrete card so that route is denied to us. Fortunately there's quite a bit of documentation on how to shut down not just X but also the Linux virtual consoles to free the GPU and hand it over to the VMs after boot. Doing so means losing KVM access to the host which is a bit annoying in case something goes wrong. So ideally we'd make sure this does not happen in grub's "safe mode" boot option.
* Although I have not done any test yet I'm reasonably certain that PCI-passthrough rules out live snapshots: QEmu would have no way to restore the graphics card's internal state.
- For Windows VMs that's not an issue: if we provide a power off snapshot the TestBot already knows how to power on the VM and wait for it to boot (as long as the boot is shorter than the connection timeout but it works out usually).
- For Linux VM's that's more of an issue: the TestBot will power on the VM as usual. The problem is when it updates Wine: after recompiling everything it deletes the old snapshot and creates a new one from the current state of the VM, which means a live snapshot. So the TestBot will need to be modified so it knows when and how to power off the VM and take a powered off snapshot.
* Since the VM has full control of the graphics card QEmu has no access to the content of the screen. That's not an issue for the normal TestBot operation, just for the initial VM setup. Fortunately the graphics card is connected to a KVM so the screen can be accessed through that means. It does mean assigning the mouse and keyboard to the VM too. Should that prove impractical there are a bunch of other options too: VNC, LookingGlass, Synergy, etc. But the less needs to be installed in the VMs the better.
* Also the TestBot uses QEmu to take the screenshots. But QEmu does not have access to the content of the screen. The fix is to use a tool to take the screenshots from within the VM and use TestAgent to retrieve them. On Linux there are standard tools we can use. On Windows there's code floating around we can use.
So the next steps would be: * Maybe test on my box using the builtin IGP. But that likely won't be very conclusive beyond confirming the snapshot issues, screen access, etc. * Find a suitable AMD or Nvidia graphics card and test that on my box. That would allow me to fully test integration with the TestBot, check for stability issues, etc. * Then see what can be done with the existing cw1 and cw2 boxes.