tl;dr : So another issue is that WineTest did not work anymore in Linux on the cw2-gtx560 machine. This was caused by a multithread bug in the nouveau drivers and report spamming. The good news is that this now seems to be mostly fixed.
What happened is that a number of OpenGL-using tests like d2d1, d3d* started using multithreaded-GL. Unfortunately that triggered bugs in Debian stable's nouveau mesa packages (13.0.6-1+b2), which caused the test to deadlock until the WineTest timeout. But worse than that it borked the system so that every other OpenGL-using test then deadlocked too, resulting in many many timed out and thus failed tests. That pushed the results way above the 70 failed tests limit, causing the report to be rejected.
The tests do have a --single option but there's no way for the TestBot / WineTest to pass this option without also disabling multithreaded tests on every other Linux platform (and non-Linux really) which would be a shame.
I tried switching to the nvidia drivers and while this solved the threading issues it resulted in tons of failures in d3d10core:d3d10core. That pushed the report size to 8 MB, way above the 1.5 MB limit, causing it to, again, be rejected.
So I switched back to nouveau and upgraded all the mesa packages to the Debian Backports version (18.2.8-2~bpo9+1) but this did really not help either :-(
So how did this get fixed?
* I sent a patch to disable multi-threading in the Direct3D-related tests if WINETEST_NO_MT_D3D is set and I am now setting it before invoking WineTest on cw2.
So if you're getting "nouveau 0000:01:00.0: timeout" messages in syslog associated with backtraces implicating the nouveau code, try setting WINETEST_NO_MT_D3D.
* Then Henri Verbeet and Józef Kucia sent patches to limit the flood of failures from the tests.
So now things are better. We are still right below the report size limit and the Direct3D tests still crash nouveau sometimes but we've been getting test results for the past two days. Yay!
Here are the top five report spammers on cw1 and cw2:
Test Average size kernel32:virtual 88 KB d3d11:d3d11 74 KB (low on HD6800, high on GTX560) user32:msg 36 KB ieframe:webbrowser 28 KB gdi32:font 27 KB
The top 20 tests (3%) make up 30% of the log size (>450 KB). The reports are between 50 and 200 KB below the 1.5 MB limit.
The next step will be figuring out what's wrong on the TestBot debian9 VM since the d3d10core:d3d10core and d3d11:d3d11 64 bit tests fail with a bunch of "Failed to create device" errors (plus 700 KB worth of other errors) and yet the 32 bit tests are ok.