https://bugs.winehq.org/show_bug.cgi?id=48912
Bug ID: 48912 Summary: Allow blacklisting unreliable and always new failures Product: Wine-Testbot Version: unspecified Hardware: x86 OS: Linux Status: NEW Severity: normal Priority: P2 Component: unknown Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
A number of tests produce false positives because: * Either the failure message contains a value that changes with each run (e.g. an HWND). In some cases these values are really necessary for diagnosis and moving them to a separate line would make the code quite a bit more complex. * Or the test fails intermittently and is hard to fix. Sometimes the cause of the failures may also come from external factors such as bugs in QEmu (for instance long execution delays causing timeouts). * Sometimes the intermittent failure is the product of a new Windows version or configuration, which requires investigating before a fix is found.
In all cases the tests should be fixed eventually, but a solution is needed for the interim so these tests to not make the TestBot so unreliable that its results are ignored.
So the goal is to provide a way to blacklist some test failures so they do not cause a patch to be rejected. Some safeguards are needed to ensure that: * Failures are not blacklisted lightly. The preference should always be to fix the test. * The blacklist does not get so large that it becomes hard to maintain. * Once a test is fixed the corresponding blacklist entries are removed in a timely fashion. * There is still an incentive to fix the tests.
So here is a proposal for implementing this blacklist:
* Each blacklist entry should be associated with a Wine bug describing the issue. That bug should provide some minimal diagnosis: whether it's a new Windows behavior, a race condition or some issue that was reported to QEmu. That would ensure we know why the blacklist entry was added. One could also check the status of the bug when reviewing the blacklist entries. A closed bug would be a strong hint that the blacklist entry is no longer needed.
* Rather than blacklisting whole test units, blacklist entries should target specific test failures via a regular expression. Besides being finer grained this would be useful for cases like user32:win which has different issues depending on the locale and where each should be linked to a different bug (bugs 48815, 48819 and 48820).
* The TestBot should record when each blacklist entry was last used. This relies on having the above regular expression since without it the TestBot would not know anything beyond 'the test unit was run and had failures'. Also the regular expression would only be used against *new* failures. So this would really record the last time the blacklist entry was actually useful.
An entry that was unused for a long time would be a prime candidate for reviewing the corresponding bug and for removal. (Note: The blacklist would also be used on WineTest reports so it would get a chance of matching its target at least 5 days / week).
* The blacklist entries would only be needed for 'base' test configurations since they are the only ones wine-devel patches run on.
* There should be a page listing the blacklist entries so developers have a good starting point to work on them.
* Ideally the blacklist page would also point to the tasks where the blacklist was last used. Since most of the blacklisted failures are either intermittent or specific to a given test configuration this would make it easier for developers to find reports where the failure did happen.
Note that Wine VMs often test multiple configurations per task (e.g. wow32 and wow64, different locales), each producing its own test report. So pointing at just the task would leave the developer guessing which report should be looked at. But that may be sufficient guidance.
* (test:unit, testbot-vm) tuples make it impossible to target a specific Wine test configuration such as a specific locale since they all run on the same VM. Similarly it would make blacklisting bitness-blind on Windows VMs. If necessary the tuple could be extended either with the specific mission the blacklist applies to, or with the basename of the corresponding report (the latter being easier to use in comparisons). Whether that's practical and worth the effort remains to be determined. One complication for instance is that this would lead to more blacklist entries: many 64 bit VMs would need two entries, one for 32 bit tests and one for 64 bit tests.
* Pseudo database schema and sample use: FailureBlacklists -----------------
PK Bug 48815 PK TestModule user32 PK TestUnit win Name 0x738 message FailureRegExp Test failed: hwnd [0-9A-F]{8,16} message 0738 LastUse 2020-03-27
FailureBlacklistVMs -------------------
PK Bug 48815 PK TestModule user32 PK TestUnit win PK VMName Entries for w1064v1709, w1064v1809, etc.
(48815, user32, win, w1064v1709) (48815, user32, win, w1064v1809) (48815, user32, win, w1064v1809_2scr) ...
FailureBlacklistUses (optionally) ---------------------------------
PK Bug PK TestModule PK TestUnit PK JobId PK StepNo PK TaskNo
(48815, user32, win, 68507, 1, 7) (48815, user32, win, 68508, 1, 7) ...