https://bugs.winehq.org/show_bug.cgi?id=48912
Bug ID: 48912 Summary: Allow blacklisting unreliable and always new failures Product: Wine-Testbot Version: unspecified Hardware: x86 OS: Linux Status: NEW Severity: normal Priority: P2 Component: unknown Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
A number of tests produce false positives because: * Either the failure message contains a value that changes with each run (e.g. an HWND). In some cases these values are really necessary for diagnosis and moving them to a separate line would make the code quite a bit more complex. * Or the test fails intermittently and is hard to fix. Sometimes the cause of the failures may also come from external factors such as bugs in QEmu (for instance long execution delays causing timeouts). * Sometimes the intermittent failure is the product of a new Windows version or configuration, which requires investigating before a fix is found.
In all cases the tests should be fixed eventually, but a solution is needed for the interim so these tests to not make the TestBot so unreliable that its results are ignored.
So the goal is to provide a way to blacklist some test failures so they do not cause a patch to be rejected. Some safeguards are needed to ensure that: * Failures are not blacklisted lightly. The preference should always be to fix the test. * The blacklist does not get so large that it becomes hard to maintain. * Once a test is fixed the corresponding blacklist entries are removed in a timely fashion. * There is still an incentive to fix the tests.
So here is a proposal for implementing this blacklist:
* Each blacklist entry should be associated with a Wine bug describing the issue. That bug should provide some minimal diagnosis: whether it's a new Windows behavior, a race condition or some issue that was reported to QEmu. That would ensure we know why the blacklist entry was added. One could also check the status of the bug when reviewing the blacklist entries. A closed bug would be a strong hint that the blacklist entry is no longer needed.
* Rather than blacklisting whole test units, blacklist entries should target specific test failures via a regular expression. Besides being finer grained this would be useful for cases like user32:win which has different issues depending on the locale and where each should be linked to a different bug (bugs 48815, 48819 and 48820).
* The TestBot should record when each blacklist entry was last used. This relies on having the above regular expression since without it the TestBot would not know anything beyond 'the test unit was run and had failures'. Also the regular expression would only be used against *new* failures. So this would really record the last time the blacklist entry was actually useful.
An entry that was unused for a long time would be a prime candidate for reviewing the corresponding bug and for removal. (Note: The blacklist would also be used on WineTest reports so it would get a chance of matching its target at least 5 days / week).
* The blacklist entries would only be needed for 'base' test configurations since they are the only ones wine-devel patches run on.
* There should be a page listing the blacklist entries so developers have a good starting point to work on them.
* Ideally the blacklist page would also point to the tasks where the blacklist was last used. Since most of the blacklisted failures are either intermittent or specific to a given test configuration this would make it easier for developers to find reports where the failure did happen.
Note that Wine VMs often test multiple configurations per task (e.g. wow32 and wow64, different locales), each producing its own test report. So pointing at just the task would leave the developer guessing which report should be looked at. But that may be sufficient guidance.
* (test:unit, testbot-vm) tuples make it impossible to target a specific Wine test configuration such as a specific locale since they all run on the same VM. Similarly it would make blacklisting bitness-blind on Windows VMs. If necessary the tuple could be extended either with the specific mission the blacklist applies to, or with the basename of the corresponding report (the latter being easier to use in comparisons). Whether that's practical and worth the effort remains to be determined. One complication for instance is that this would lead to more blacklist entries: many 64 bit VMs would need two entries, one for 32 bit tests and one for 64 bit tests.
* Pseudo database schema and sample use: FailureBlacklists -----------------
PK Bug 48815 PK TestModule user32 PK TestUnit win Name 0x738 message FailureRegExp Test failed: hwnd [0-9A-F]{8,16} message 0738 LastUse 2020-03-27
FailureBlacklistVMs -------------------
PK Bug 48815 PK TestModule user32 PK TestUnit win PK VMName Entries for w1064v1709, w1064v1809, etc.
(48815, user32, win, w1064v1709) (48815, user32, win, w1064v1809) (48815, user32, win, w1064v1809_2scr) ...
FailureBlacklistUses (optionally) ---------------------------------
PK Bug PK TestModule PK TestUnit PK JobId PK StepNo PK TaskNo
(48815, user32, win, 68507, 1, 7) (48815, user32, win, 68508, 1, 7) ...
https://bugs.winehq.org/show_bug.cgi?id=48912
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Depends on| |48914
https://bugs.winehq.org/show_bug.cgi?id=48912
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|P2 |P4 Severity|normal |critical
https://bugs.winehq.org/show_bug.cgi?id=48912
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Blocks| |48209
--- Comment #1 from François Gouget fgouget@codeweavers.com --- *** Bug 48209 has been marked as a duplicate of this bug. ***
https://bugs.winehq.org/show_bug.cgi?id=48912
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|wine-bugs@winehq.org |fgouget@codeweavers.com
https://bugs.winehq.org/show_bug.cgi?id=48912
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED
--- Comment #2 from François Gouget fgouget@codeweavers.com --- This is done and an initial list of known bugs has been populated. From the patchset cover: https://www.winehq.org/mailman3/hyperkitty/list/wine-devel@winehq.org/thread...
This patchset provides support for tracking known failures in the TestBot results. The main goal is to avoid false positives but there are some side benefits as well: * The job reports sent to developers link to the relevant WineHQ bugs which gives them more exposure and may help get them fixed faster. * The failure details pages will eventually link to all the tasks where the failure happened, which could be valuable to find patterns. * The failure page tracks when it was last seen. This simplifies spotting bugs which are fixed and could be closed. (Or failures where the regular expression needs to be updated / fixed)
The patchset mostly follows the scheme described in bug 48912: https://bugs.winehq.org/show_bug.cgi?id=48912
There are some differences though: * The bug envisionned this scheme purely as a blacklist to avoid false positives. The implementation can certainly be used in this way but it is not limited to failures that cause false positives. Tracking other failures can provide the side benefits mentionned above. The only limit is the maintenance work required to track more failures.
* As a consequence the FailureBlacklist class was renamed to Failure, and the FailureBlacklistUses was renamed to TaskFailures since it links Tasks with the related Failures.
* The FailureBlacklistVMs purpose is not really described in the bug but I think the goal was to provide a way to deckare that the failure is expected to only happen on specific test configurations. Instead the Failures class has a ConfigRegExp regular expression which specifies which test configurations the failure applies to. The test 'configuration name' is of the form 'VM:logfile' which allows matching specific Windows versions (thanks to the TestBot VM naming scheme), specific Windows locales (part of the VM name), specific bitnesses (part of the log filename), or specific Wine locales (part of the log filename). For instance: ^w8 -> all Windows 8 VMs :exe -> all Windows tests (exe32.report and exe64.report) ^w10.*_ja.*: -> all Windows 10 Japanese test configurations win|wow -> all Wine test configurations (win32.report, ...) (win|wow).*_fr -> French Wine test configurations
* Changes in FailureBlacklists -> Failures - Primary key: (Bug, TestModule, TestUnit) -> Id Some issues generate quite a few failures. So in case trying to match them all results in a regular expression that's too long (either because of the database limit or for readability), this change makes it possible to add multiple Failure objects to track them all.
- TestModule -> ErrorGroup The .errors files split the errors in groups. Each group corresponds to a test module, but extra errors (too much output, missing summary lines, etc) are in a separate group. So I renamed TestModule to ErrorGroup as a reminder to the coder that this is meant to match the .errors group names, not just test modules.
- Name -> (BugStatus, BugDescription) Instead of having a name describing the failure, this patchset uses the corresponding WineHQ bug description. The TestBot also retrieves the bug status which allows checking that closed bugs dont happen anymore, or open bugs that should be closed.
- LastUsed -> (LastNew, LastOld) The LastUse field was split into LastNew and LastOld to track whether the failure would cause false positives without the known failures support.
* Changes in FailureBlacklistUses -> TaskFailures - (Bug, TestModule, TestUnit) -> FailureId This is a consequence of the Failure primary key change described above.
- () -> (TaskLog) This allows tracking which task log the failure was found in. This is quite important for Wine tasks since they run all the test configurations in just one task.
- () -> (NewCount, OldCount) This provides a count of matching failures and whether they would have been tagged as old or new without this patchset. This provides a quick way to know whether there is the same number of failures every time or whether the number of failures changes from one run to the next. It also provides a quick way to know how many failures to look for when jumping to the log.