It's been a bad month for the TestBot.
* The first issue was not with the TestBot itself but with cw1-hd6800 which provides the 'real hardware' WineTest results for the AMD HD 6800 graphics card. Its hard drive just died. Newman promptly replaced it and I restored that system from backups (linux + windows).
The good thing that came out of it is that I added the 1809 Windows 10 build to the mix and did so for the cw2-gtx560 system while I was at it. Unfortunately that's pretty much all for nothing right now since Windows 10 1809 has over 70 failures and all the WineTest reports just end up being thrown away :-(
* Then roughly a week later one of the hard drives on vm2 died. vm2 is one of the machines that run the TestBot VMs. That should not have been an issue except the harddrive did not outright die and caused the hardware RAID controller to keep trying to write things to it, tying it up in the process. Eventually the Linux kernel got fed up with the controller building a backlog of writes and turned all filesystems read-only. Things don't work very well after that!
So I proceeded to restore the VMs from backups on the other hosts so the TestBot could work again. Then Newman again promptly replaced the harddrive, the controller slowly rebuilt the array, and I moved the VMs back to vm2. But the TestBot had built quite a backlog by then and it took time for it to catch up.
* One issue is that vm4 was kept pretty busy by the Linux tests: win32 + various locale tests; then wow32 and wow64. So I duplicated the wtbdebian9 VM to vm3 and split the tasks between them: win32 + locales on vm4 and wow32 + wow64 on vm3. Unfortunately the 'Submit job' page is pretty primitive and systematically creates tasks that do all 3 builds: win32, wow32 and wow64. Since none of the wtbdabian9 VMs had all three, one Wine build was always way out of date resulting in long build times and timeouts. So I had to go back to a single Linux VM until I can send a better submit jobs page.
* The next issue came when a security update on winehq.org broke Net::SSH2, thus preventing the TestBot from connecting to the VMs and sending the patches or executables to test. After some investigation I decided that Net::SSH2 is a lost cause (to be polite) and I switched the TestBot to Net:OpenSSH.
* At about the same time the commit 47242d25f5b2 moved string.c to libwine_port and somehow that broke the 64 bit reg.exe. reg.exe is the first call the TestBot makes to create a new WinePrefix to disable the crash dialog. So of course when reg.exe crashes the crash dialog pops up and the WinePrefix creation remains stuck. This means the Linux 'Update Wine' tasks remain stuck too, for 1h15 a piece, three times, and eventually Wine remains out of date :-(
So there we are. The TestBot is slowly catching up on its backlog (120 tasks to go) and hopefully, once the reg.exe issue is solved, the next month will see fewer crises.
tl;dr : So another issue is that WineTest did not work anymore in Linux on the cw2-gtx560 machine. This was caused by a multithread bug in the nouveau drivers and report spamming. The good news is that this now seems to be mostly fixed.
What happened is that a number of OpenGL-using tests like d2d1, d3d* started using multithreaded-GL. Unfortunately that triggered bugs in Debian stable's nouveau mesa packages (13.0.6-1+b2), which caused the test to deadlock until the WineTest timeout. But worse than that it borked the system so that every other OpenGL-using test then deadlocked too, resulting in many many timed out and thus failed tests. That pushed the results way above the 70 failed tests limit, causing the report to be rejected.
The tests do have a --single option but there's no way for the TestBot / WineTest to pass this option without also disabling multithreaded tests on every other Linux platform (and non-Linux really) which would be a shame.
I tried switching to the nvidia drivers and while this solved the threading issues it resulted in tons of failures in d3d10core:d3d10core. That pushed the report size to 8 MB, way above the 1.5 MB limit, causing it to, again, be rejected.
So I switched back to nouveau and upgraded all the mesa packages to the Debian Backports version (18.2.8-2~bpo9+1) but this did really not help either :-(
So how did this get fixed?
* I sent a patch to disable multi-threading in the Direct3D-related tests if WINETEST_NO_MT_D3D is set and I am now setting it before invoking WineTest on cw2.
So if you're getting "nouveau 0000:01:00.0: timeout" messages in syslog associated with backtraces implicating the nouveau code, try setting WINETEST_NO_MT_D3D.
* Then Henri Verbeet and Józef Kucia sent patches to limit the flood of failures from the tests.
So now things are better. We are still right below the report size limit and the Direct3D tests still crash nouveau sometimes but we've been getting test results for the past two days. Yay!
Here are the top five report spammers on cw1 and cw2:
Test Average size kernel32:virtual 88 KB d3d11:d3d11 74 KB (low on HD6800, high on GTX560) user32:msg 36 KB ieframe:webbrowser 28 KB gdi32:font 27 KB
The top 20 tests (3%) make up 30% of the log size (>450 KB). The reports are between 50 and 200 KB below the 1.5 MB limit.
The next step will be figuring out what's wrong on the TestBot debian9 VM since the d3d10core:d3d10core and d3d11:d3d11 64 bit tests fail with a bunch of "Failed to create device" errors (plus 700 KB worth of other errors) and yet the 32 bit tests are ok.
On Wed, 10 Apr 2019, Francois Gouget wrote: [...]
So I switched back to nouveau and upgraded all the mesa packages to the Debian Backports version (18.2.8-2~bpo9+1) but this did really not help either :-(
It turns out that the d3d11 tests could really use ARB_pipeline_statistics_query which upgrading to 18.2.8-2~bpo9+1 can get us.
https://www.winehq.org/pipermail/wine-devel/2019-April/143876.html
So I went for another round of upgrades on cw1, cw2, wtbdebian9 and fgtbdebian9 (my test TestBot VM).
* All boxes have been upgraded to the current Debian 9.
* They all have the 18.2.8-2~bpo9+1 Mesa packages from the Debian Backports repository.
* It turns out that once that was done, all the required dependencies were already installed for the Debian Testing libvkd3d packages. So I installed those too.
* I also added the MinGW packages.
* And I checked the installed package lists on all four machines to synchronize them as much as possible. This turned up some missing 32 bit packages which I installed. The issue was typically was that I could not install a -dev:i386 package due to multiarch issues, so that instead I installed the packages it depends on and added the missing symbolic links but missed some of the dependencies.
* Now cw1 and cw2 have the exact same packages; and wtbdebian9 and fgtbdebian9 are identical as well. The two sets of machines are a bit different because cw1 and cw2 have LibVirt and Munin unlike their VM counterparts.
* Another difference is that cw1 and cw2 have the cups package and all the bagage that goes with it whereas the VMs only have the libcups* packages. Let me know if that makes a difference.
* I could not install the Debian Unstable FAudio packages: they depend on a newer version of libSDL2 which requires a lot of other significant packages upgrades, like libc6.
So everything is finally backed up and back up. Let me know if something looks wrong in the TestBot results.
Cheers,
On Tue, 16 Apr 2019, Francois Gouget wrote: [...]
- It turns out that once that was done, all the required dependencies were already installed for the Debian Testing libvkd3d packages. So I installed those too.
But of course it turns out that both the Nvidia GTX 560 and the Radeon HD 6800 are too old to support Vulkan :-(
I hope at least adding the Vulkan packages does not hurt.
Might be time to upgrade those two boxes with some modern hardware.
-N
April 23, 2019 8:29 AM, "Francois Gouget" fgouget@codeweavers.com wrote:
On Tue, 16 Apr 2019, Francois Gouget wrote: [...]
- It turns out that once that was done, all the required dependencies were
already installed for the Debian Testing libvkd3d packages. So I installed those too.
But of course it turns out that both the Nvidia GTX 560 and the Radeon HD 6800 are too old to support Vulkan :-(
I hope at least adding the Vulkan packages does not hurt.
-- Francois Gouget fgouget@codeweavers.com
On Tue, 23 Apr 2019, Jeremy Newman wrote:
Might be time to upgrade those two boxes with some modern hardware.
Yes. I'd like to experiment with PCI-passthrough before we do that though.
The good news is that it looks like my box has all the required VT-d support.