https://bugs.winehq.org/show_bug.cgi?id=48166
Bug ID: 48166 Summary: test.winehq.org Provide a way to track individual failures Product: WineHQ.org Version: unspecified Hardware: x86 OS: Linux Status: NEW Severity: normal Priority: P2 Component: www-unknown Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
A test unit may have multiple unrelated test failures. Some may fail on recent Windows 10 machines while others may only happen in certain locales or some graphics cards. Untangling these can actually be automated.
The root is to merge the failures of two reports together while associating each failure with the tag or the report(s) it originates from. To do so diff the two error lists. * Failures that are present in both get both tags. * Failures that are only present in the first report get only that tag. * Same thing for failures that are present only in the second report. And all failures are integrated in the merged list in the order they are returned in by Algorithm::Diff.
Once two failure lists have been merged together, it's possible to continue merging more failure lists. This allows getting a unified list of the failures of a given commitid, and appending that commitid to the tags allows building a complete list of all the available history.
Then one can group all failures that have the exact same set of tag+commitids combinations together. Then, since the failures in different groups don't all happen together they must depend on different factors.
Intermittent failures, timeouts, crashes ----------------------------------------
If two failures are related but a random timeout or crash sometimes occurs between them they might end up being incorrectly split in two separate groups.
So if a crash or a timeout occurs, any other failure in that run should be ignored when grouping failures together.
This can be achieved by prefixing the tag with a '*' if a crash or timeout occured. Then, when grouping failures together, ignore the entries where the tag starts with a '*'. But when building the occurence pattern, do use the entries starting with a '*' to show all the test runs where at least one of the failures in the group occurred.
Because entries starting with a '*' are ignored when building failure groups, failures that cause a crash/timeout and the crash/timeout line itself will be part of no failure group. So do a second pass over the unassigned failures, this time not ignoring the entries with a '*'. This will create groups composed of the crash/timeout and any related failures.
New failures ------------
This analysis indicates where and when a given failure group happened. This means it can also detect new failures.
It would not be useful to define a new failure as one that never happened before the latest commit: blink and you might miss it. Instead it should be expanded to all failures that only happened in the first half of the available history. This may sometimes falsely treat rare intermittent failures as new but that should be rare enough.
Presentation of the results ---------------------------
The results can be presented on a page with one box per test unit like the 'Full Report' pages. A 'details' link under the test unit name on the test failrues pattern page would link to the relevant section of the full page.
Inside each box there would be a sequence of lines showing the failures in a group, followed by the usual pattern showing which machines the failure happens on; then the next failure group, etc. For instance:
console.c:270: Test failed: got 16, expected 6 console.c:275: Test failed: got 16, expected 6 .....F..F...F..F.mmm Win8 vm1 ......FF.e...FF.e..F Win8 vm1-ja
096c:console: unhandled exception c0000005 at 6F384E33 .....CC Win8 vm2-new
As usual the items in the pattern would link to the relevant pages, allowing to dig deeper into the issue. The same color coding would be used for the pattern but since failure groups always have the same number of failures only one color would be used for the F code.
The failure line number will change from one run to the next so zero it out or only retain one value at random. Similarly, if the failure contains timing information (see bug 48094), remove the timing information.
If a failure is identified as new, put its lines in bold orange, like on the TestBot. This will allow quick identification of the new failures on the page.