https://bugs.winehq.org/show_bug.cgi?id=48164
Bug ID: 48164 Summary: test.winehq.org should provide an efficient way to detect new failures Product: WineHQ.org Version: unspecified Hardware: x86 OS: Linux Status: NEW Severity: normal Priority: P2 Component: www-unknown Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
Problem -------
test.winehq.org does not allow performing the following tasks efficiently: 1. Detecting when a new failure slips past the TestBot. One can detect new failures on the per test unit page when specific columns turn red. But quite often the test unit already has failures so one has to look at the specific number of failures. Furthermore many test units currently have failures so this requires checking 80+ pages individually.
2. Detecting when the results on a VM degrade. After upgrading a machine it's useful to compare it to its previous results. But the results for each date are on separate pages. So again it's necessary to check the per-test-unit result pages.
3. Comparing the results of two machines of different platforms. For instance comparing the results of running Windows 8 to those of Windows 10 on the same hardware.
Other things that got asked: 4. Sometimes it would be nice to have only the failures, and not all the lines with skipped tests and todos.
5. In some cases it would also be nice to have pages with only the failures that happen on TesBot VMs since these are usually easier to reproduce.
Jeremy's page -------------
Jeremy's test summary page can help with some of that: https://www.winehq.org/~jwhite/latest.html
But: * It's not integrated with test.winehq.org which makes it hard to find. * There are only two states: Success and Failed: So it does not help when a test goes from having 2 failures to 4, or when it has a set of systematic failures and a set of intermittent ones. * The failed / success pattern is not per VM which masks some patterns and does not help with point 2.
Proposal --------
A modified version of Jeremy's page could be integrated with test.wnehq.org:
* It would actually be a pair of 'Failures' pages, one for TestBot VMs and one for all test results. Both would be linked to from the top of the main index page, for instance using the same type of 'prev | next' text links used on the other pages.
* Jeremy's result matrix would be extended from three to four dimensions; test units, test results, time, and number/type of failures.
* As before the results would be grouped per test unit in alphabetical order. Only the test units having at least one error, recent or not, would be shown. This could again be in the form of an array ('full report' pages on test.winehq.org) or simply test unit titles (TestBot jobDetails page style) with the information about each test unit inside. Clicking on the test unit name would link to its 'test runs' page on test.winehq.org.
* For each test unit there would be one line per test result having errors. The first part of the line would have one character per commit for the whole history available on test.winehq.org. That character would indicate if the test failed and more. The second part of the line would be the test result platform and tag. They would be sorted per platform and alphabetically.
* Each test result would get a one character code: . Success F Failure C Crash T Timeout m Missing dll (foo=missing or other error code) e Other dll error (foo=load error 1359 and equivalent) _ No test (the test did not exist) ' ' No result (the machine did not run the tests that day)
* These codes would be shown using a monospace font so they would form a pattern across time and test results: .....F..F...F..F.mmm Win8 vm1 .....FFFFeFFFFFFeFFF Win8 vm1-ja ...TTCC Win8 vm2-new ......eF...F...F..F. Win10 vm3
* Each character would have a tooltip containing details like the meaning of the letter, the number of failures, or the dll error message. They would also link to the corresponding section of the test report.
* In addition to the character the background would be color coded to make patterns more visible. . Green F Green to yellow to red gradient C Dark red T Purple/pink m Cyan e Dark blue _ Light gray ' ' White
* The green-yellow-red gradient would be what allows detecting changes in the number of test failures. That gradient must be consistent for all lines of a given test unit's pattern. Furthermore the gradient must not be computed based on the test result's number of failures. That is, if a test unit has either 100 or 101 failures, those must not have nearly indistinguishable colors. Instead the set of all different failure counts for the test unit should be collected. Zero should be added to that set. Then these values should be sorted and a color attributed for each *index*. Then the background color is selected based on the index of that result's failures count. It is expected that each set will be relatively small so that the colors will be reasonably far apart, making it easy to distinguish a shift from 4 to 6 failures even if there are 100 failures from time to time. Also note hat adding zero to the set essentially reserves green for successful results.
Implementation feasibility --------------------------
* No changes in dissect.
* In gather, generate a new testunits.txt file containing one line per test unit: - The data source would be the per-report summary.txt files. -> These don't indicate when a timeout has occurred so timeouts will appear as F instead which is acceptable for a first implementation. - The first line would contain a star followed by the tags of all the test runs used to build the file: - The other lines would contain the name of the test unit followed by space-separated pairs of result code/failure count and result tag (including the platform). - A line would be put out even if the test unit had no failure.
For instance, the commit1 testunit.txt file could contain: * win8_vm1 win8_vm1-ja win8_vm2-new win10_vm3 foo:bar 43 win8_vm1-ja C win8_vm2-new e win10_vm3 foo:bar2
- In the example above win8_vm1 only appears on the first line. This means WineTest was run on that machine but had no failure at all. - If the results for commit2 refer to a win8_vm4 machine, we will know that the reason win8_vm4 does not appear in commit1 file is not because all the tests succeeded, but because WineTest was not run on win8_vm4 for commit1. This means that the result code for win8_vm4 for commit1 should be ' '. not '.' for all test units. - If commit2 has results for the foo:bar3 test unit, then we will know the reason it is not present in the commit1 file is not because all the test runs were successful, but because foo:bar3 did not exist yet. So its result code would be '_', not '.'.
* Add a new build-failures script to generate both failures pages. - This script will need to read the testunits.txt file for all the commits. The simplest implementation will be to read all the data into memory before generating the page. This will avoid having to deal with keeping the current test unit synchronized between all of the testunits.txt files when a new test unit has been added.
- The combined size of the testunits.txt files is expected to be reasonable, within a factor of 3 of the summary.txt files. For reference, here is some data about the sizes involved: $ du -sh data 21G data $ ls data/*/*/report | wc -l 2299 $ cat data/*/*/report | wc 34,087,987 231,694,407 2,104,860,095 $ cat data/*/*/report | egrep '(: Test failed:|: Test succeeded inside todo block:|done [(]258)|Unhandled exception:)' | wc 567,158 6,275,504 53,202,999 $ cat data/*/summary.txt | wc 186,219 3,046,363 30,596,901
- Having a function to generate the page will allow calling it twice in a row to generate both pages without having to load and parse the testunits.txt files twice.
https://bugs.winehq.org/show_bug.cgi?id=48164
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |austinenglish@gmail.com
https://bugs.winehq.org/show_bug.cgi?id=48164
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |critical Assignee|wine-bugs@winehq.org |fgouget@codeweavers.com
https://bugs.winehq.org/show_bug.cgi?id=48164
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|P2 |P4
https://bugs.winehq.org/show_bug.cgi?id=48164
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #1 from François Gouget fgouget@codeweavers.com --- This is done.
Problem -------
1. Detecting when a new failure slips past the TestBot.
https://test.winehq.org/data/patterns.html
-> The new page has one pattern per test with failures (including those with missing dlls, etc). As planned the failure count is color coded so changes are easily visible. -> All the patterns are on a single page which is easier to review than visiting one page per test. -> The tests are put in four categories, with the ones most likely to contain new failures coming first. This helps reduce the number of tests that need to be looked at to have a good chance of detecting the new failures.
2. Detecting when the results on a VM degrade.
-> At the top of the page a color coded pattern shows the number of failed test units for each VM. Changes in color make result degradation obvious. This also allows detecting when a VM stops reporting results (whether because it has too many failures, times out or other reasons).
3. Comparing the results of two machines of different platforms.
-> The pattern at the top of the page contains results for all platforms. The results are sorted by platform to make it easy to compare their result as a group, but the results of different platform are just a few lines apart.
4. Show only the failures, not all the lines with skips and todos.
-> This is the normal behavior of the new pattern page.
5. Have a page showing only the TestBot results.
-> Two pages show the results for the Windows and Wine TestBot VMs respectively.
https://test.winehq.org/data/patterns-tb-win.html https://test.winehq.org/data/patterns-tb-wine.html
Implementation tweaks ---------------------
* Instead of a single color gradient there is a chain of color gradients to help with patterns that have many different failure counts. This is configurable and currently goes like this: dark cyan -> green -> yellow -> red.
* For each test the page links to related commits. For each commit it show if it patched the test itself, a shared test resource or the Wine module. This simplifies reviewing potential culprits when a test starts failing.
* For each test the page links to the related bugs. This simplifies checking if someone has already analyzed the bug or tried fixing it. If the bug has a regression commit id that commit is shown as such.
* The page also shows bugs for tests that have no failure. This simplifies verifying that the bugs have been updated when a test is fixed.
Commits -------
There are about 27 commits spread between the two below:
commit bce49667e6418c326ade5704ee65c5bf8c6930d6 Author: Francois Gouget fgouget@codeweavers.com Date: Tue Apr 27 12:45:07 2021 +0200
winetest/build-patterns: Add a page showing failure patterns.
The new page shows the failure patterns across time and test configurations to help detect Wine changes that cause new failures, and identify when failures happen in specific configurations.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=48164 Signed-off-by: Francois Gouget fgouget@codeweavers.com Signed-off-by: Alexandre Julliard julliard@winehq.org
commit 0ef48b83ac84f3a59ed5521b24d5a7a8d1dcde17 Author: Francois Gouget fgouget@codeweavers.com Date: Wed May 19 03:15:13 2021 +0200
winetest/build-patterns: Show Potentially obsolete bugs.
Bugs related to a test that has no failure are likely to be out of date, except if they are related to memory issues.
Signed-off-by: Francois Gouget fgouget@codeweavers.com Signed-off-by: Alexandre Julliard julliard@winehq.org
https://bugs.winehq.org/show_bug.cgi?id=48164
Rosanne DiMesio dimesio@earthlink.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #2 from Rosanne DiMesio dimesio@earthlink.net --- Closing fixed website bug.