Here's an update on the merge request and nightly Wine test runs false positive rates.
Reminder: A false positive (FP) is when the TestBot or GitLab CI say a failure is new when it is not.
* TestBot The FP rate is still between 5 and 10% (see attached graphs). Now we have more history data so we can see that the FP rate went steadily down from 20% to 5% in December, i.e. during the freeze and when I was first populating the TestBot's list of known failures. https://testbot.winehq.org/FailuresList.pl
Then in the month of January the average rate gradually went back up to about 10%. I chalk it up to more risky commits being allowed again. It would be nice for the FP rate to go back down to 5% but it's not clear if that will happen.
* GitLab CI The GitLab CI's FP rate also went down in December, hitting a low of 10% for the new year. But in January it immediately went up again. Combined with the high November FP rate, the December dip is not really visible on the 5 week average.
As I said, the FP rate has been going up since the new year. Again I think that's the effect of more risky commits going in. That shows on the 5 week average which is now between 25% and 30%, higher than ever before :-(:
Unlike on the TestBot, the GitLab CI has no way to ignore known false positives. So if you don't want the GitLab CI claiming your merge requests introduce new failures, the only way is to fix the tests. And I guess that's not a bug. It's a feature [1].
Where to start you may ask?
A good place would be the test units that cause the most false positives:
22 dinput:device8 17 ntdll:threadpool 16 user32:msg 9 d3d11:d3d11 7 ws2_32:afd 6 ws2_32:sock 6 user32:win 6 ole32:clipboard
And among those, some failure modes are particularly troublesome:
17 dinput:device8 -> bug 54594 16 ntdll:threadpool -> bug 54064 9 d3d11:d3d11 -> bug 54510 8 user32:msg -> bug 54037 7 ws2_32:afd -> bug 54113 6 ole32:clipboard -> bug 54005
That is, user32:msg, for instance, can fail in many different ways but among the 16 times it caused a false positive (first list), 8 of them were because of the specific failure described in bug 54037 (second list).
[1] Not that it ever worked for the TestBot.