On Fri, 27 Mar 2020, Henri Verbeet wrote: [...]
If the main goal is to stop the testbot from being ignored, and to limit the number of new failures sneaking in, would it make sense to start with something fairly blunt, like ignoring failures for tests on unreliable configurations? E.g., suppose ddraw:ddraw7 reliably passed on w1064v1507, but not w1064v1809, you'd then blacklist all of ddraw:ddraw7 on w1064v1809. That means you potentially ignore some ddraw:ddraw7 tests that are reliable, but it would still be an improvement over effectively ignoring everything.
So that would mean maintaining a set of (test:unit, testbot-vm) tuples where the TestBot should ignore new failures.
I'm not very fond of the blacklist approach. Once it's in place it may be very tempting to just put every flaky test into it rather than fixing it. This will lead to a long list of exceptions which will have to be maintained. In particular knowing when to remove an entry will be very important.
I also worry that once the test failures are papered over there won't be much incentive to fix them. To be fair that risk is not really different from what could happen with my patch but the scale would be larger.
But it could work with the rare intermittent failures too which would be valuable. And it could be useful when introducing new test configurations that have new intermittent / variable issues. So there could be value in doing this anyway.
Maybe with some safegards it can be made to work.
* I think I'd want a Wine bug describing the issue to be associated with each blacklist entry. That bug should provide some minimal diagnosis: whether it's a new Windows behavior, a race condition or some issue that was reported to QEmu. That would ensure we know why the blacklist entry was added. One could also check the status of the bug when reviewing the blacklist entries. A closed bug would be a strong hint that the blacklist entry is no longer needed.
* And I think it would be better to have a regexp that matches only the troublesome failures rather than to blacklist the whole test unit. Besides being finer grained this would be useful for cases like user32:win which has different issues depending on the locale and where each should be associated to a different bug (bugs 48815, 48819 and 48820).
* I think I'd also want to record the time when the blacklist entry was last used. This relies on having the above regular expression since without it the TestBot would not know anything beyond 'the test unit was run and had failures'. Also the regular expression would only be used against *new* failures. So this would really record the last time the blacklist entry was actually useful.
An entry that was unused for a long time would be a prime candidate for reviewing the corresponding bug and for removal. (Note: The blacklist would also be used on WineTest reports so it would get a chance of matching its target at least 5 days / week).
* I'd want a page listing the blacklisted entries so developers have a good starting point to work on them.
* Ideally the blacklist page would also point to the tasks where the blacklist was last used. I think this would also be useful for developers trying to fix the issues, particularly for the rare intermittent kind.
Note that Wine VMs often test in multiple configurations per task (e.g. wow32 and wow64, different locales), each producing its own test report. So pointing at just the task would leave the developer guessing which report should be looked at. But that's probably ok.
More importantly, (test:unit, testbot-vm) tuples make it impossible to blacklist a specific Wine test configuration such as a specific locale since they all run on the same VM. Similarly it would make blacklisting bitness-blind on Windows VMs.
If necessary the tuple could maybe be extended with the specific mission the blacklist applies to. But I'm not sure on the specific impacts and it may not be worth it.
* Pseudo database schema and sample use:
FailureBlacklists -----------------
PK Bug 48815 PK TestModule user32 PK TestUnit win Name 0x738 message FailureRegExp Test failed: hwnd [0-9A-F]{8,16} message 0738 LastUse 2020-03-27
FailureBlacklistVMs -------------------
PK Bug 48815 PK TestModule user32 PK TestUnit win PK VMName Entries for w1064v1709 w1064v1809 etc.
(48815, user32, win, w1064v1709) (48815, user32, win, w1064v1809) (48815, user32, win, w1064v1809_2scr) ...
FailureBlacklistUses (optionally) ---------------------------------
PK Bug PK TestModule PK TestUnit PK JobId PK StepNo PK TaskNo
(48815, user32, win, 68507, 1, 7) (48815, user32, win, 68508, 1, 7) ...