So I managed to reproduce this failure: https://testbot.winehq.org/JobDetails.pl?Key=65628
First ignore the w7pro64 failures: it seems one cannot run the job tests twice in a row on Windows 7, presumably because some missing cleanup isses. So these are not interesting here.
Also I get 3 failures out of 9 Windows 10 runs, where each goes through the job tests 10 times. So that's a failure rate of 3.3%. That seems inconsistent with the test.winehq.org results: it has between 40 times 10 (newtb*) and 40 times 16 (newtb*+cw*) WineTest runs and none shows this failure, putting the failure rate below 0.25%.
This job is also not sufficient to prove that the issue is specific to Windows 10: it does not have enough Windows 8 runs to get a statistically significant result. test.winehq.org has between 40*2 (newtb*) and 40*5 (newtb*+cw*) WineTest runs which still seems inconsistent with a 3% failure rate.
So although there's not really definitive proof, it looks like this may be specific to standalone kernel32:process runs on Windows 10.
Timeout -------
The timeouts happened on the 1st or 2nd round while waiting for 'kernel32_test.exe process exit': * 1x WaitForSingleObject(1s) of test_jobInheritance() * 2x WaitForSingleObject(1s) of test_QueryInformationJobObject()
'process exit' has a 0.1s sleep which may not be strictly necessary. Still there is no clear reason for it to not complete within the imparted 1s.
So I suspect something delayed 'process exit' either within the VM or outside it.
* An out-of-VM troublemaker should be decorrelated from the in-VM activity and thus hit WineTest and any kernel32:process job tests pass with equal frequency... except if it's something related to the VM revert / startup (e.g. SSD garbage collecting after the revert I/O peak).
* Normally all in-VM troublemakers such as Windows Update, Defender, Search are disabled [1]. Maybe there's still something running shortly after the VM's clock gets changed that causes trouble.
Options: * The simplest would be to increase the timeout to 2s for instance? This should have essentially no impact on run time since we should not hit the timeout in most cases.
* Automatically rerun any wine-dev task that has failures, hoping that it will not fail on the second run.
This would not be specific to this kernel32:process issue. The drawbacks are that: - This risks letting in any test that fails less than 50% of the time. - This would delay emails notifying that a patch causes new failures. - This would increase the TestBot load somewhat but that would likely be manageable.
I would also argue that this is not really necessary: - Now that intermittently failing tests are properly accounted for this case should be quite rare. - When this happens one can simply analyze the test (and fix it?) and rerun the patch manually or resubmit it through wine-devel to prove the failure was an unrelated fluke.
Conversely some timeouts are pretty high: 30s and 60s. Presumably we hope they will never happen. There is also a 1s Sleep(1000) which I don't really see the relevance of (in test_SuspendFlag()).
Trace mangling --------------
Two of the timeouts happened in the absence of trace mangling. So the two issues seem unrelated.
The processes involved in the trace mangling are:
* w1064v1709 x3+, w1064v1607 x2, w1064v1809_fr x1 the last 'process exit' started by test_WaitForJobObject() polluting the 'not waited for' parent trace -> This proves test_WaitForJobObject() is buggy.
* w1064v1709 x1 'process exit' started by test_jobInheritance() polluting the WaitForSingleObject() timeout failure for that same process in the parent. -> The only way to avoid that is to avoid the timeout?
[1] Here are the full notes for the base 1507 snapshot:
Windows 10 1507 64-bit Home Edition. Uses a SCSI disk (0.1.164), e1000 network card, ich6 sound card, VGA graphics card. Disabled the screensaver, disk and computer suspend, Windows update, Windows defender, Windows search, restoration points, defragmentation, telemetry and the CEIP, hibernation, swap, time of last access (fsutil behavior set disablelastaccess 1). Added optional DirectX components. Autologin and TestAgent 1.7 autostart.
[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU] "NoAutoUpdate"=DWORD:1 "AUOptions"=DWORD:2