On 5/5/22 17:27, Francois Gouget wrote:
On Wed, 4 May 2022, Rémi Bernon wrote: [...]
When a patch is submitted that is detected as potentially touching more than a single test, all the tests for the module are queued for testing. However this isn't done through WineTest, and instead they are all queued and tested separately, at least on the Windows VMs.
Wouldn't it be better to always run the tests through WineTest, and make it run all the tests that need checking at once?
That would certainly be more efficient time-wise.
Network-wise there's the issue that WineTest.exe is big because it always contains all the tests so it would cause more network traffic. But the traffic issue is probably minor and it is probably possible to tweak the builds to reduce the size.
Yes, winetest.exe is large (77Mo here), but I think it can compress well. A zstd version is ~12Mo, xz is ~9Mo. Still 5-10x larger than individual test executables but if you count the overhead of copying these tests for every subtests to run, it may not be so much of a difference anymore.
But as you mentionned, the main issue is that the tests could interfere with each other which so far has been regarded as "polluting" the results. But we could see things differently.
I agree that it may be problematic, but it also means that we would perhaps have less of these failures in the nightly builds if they get caught early.
It'd also be more easy to debug, as sending a patch touching two module tests would be enough to run the two tests at once and debug combined issues. Whereas right now I think you have to upload winetest yourself and run the right command-line.
It's also only going to be causing problems if you run some combinations of tests, and most of the time only one test is run at once, or all the tests for a single module, which should have less weird interactions.
Another question, unrelated to the performance problems, could we consider adding more Desktop/WM environments to the Debian VMs? I think it could be interesting to have to track down winex11 bugs, though it's probably likely to have several broken tests.
So far the main goal has been to avoid failures so the desktop environment has been optimized with that in mind (so fvwm with a carefully crafted configuration).
But again things have changed since then. Most importantly the TestBot can now distinguish old failures from new ones and I'm still working towards having a way to prevent the "always new" failures from causing false positives.
With those two in place running the tests in configurations known to cause failures is less of an issue.
One way support for multiple desktop environments could be done in the current framework would be to have one Linux VM per desktop environment. However that means compiling once per test environment which has an impact on performance. With a fast new server (or servers) that could work though.
The alternative would be to install multiple desktop environments in the same Debian VM (easy) and have the client-side TestBot script switch from one desktop environment to another based on the configuration to test. I'm not sure how that would work though.
I think it's safer to use multiple VMs. And it would let us test desktop environments in their vanilla flavor, which is imho what we should try to make work best. I think mixing or switching desktop environments often ends up with undesired side effects.
I also intend at some point, when win32u conversion will be more settled, to finish sending my nulldrv patches, and I think it'd be nice to have a testbot flavor that could be configured to use it instead of the default graphics driver.
It sounds like that's just a matter of configuring the test environment to use nulldrv instead of the regular graphics driver (including possibly unsetting $DISPLAY). So that would be a bit like setting the locale and could probably be done through the missions mechanism without requiring a separate test environment.
Yeah I don't know how the prefix preparation is done. Right now there's no environment variable to control the driver, and unsetting DISPLAY was considered as not great as it could hide a genuine user mistake without complaining.