On 5/2/21 7:07 PM, Francois Gouget wrote:
On Sat, 1 May 2021, Zebediah Figura (she/her) wrote: [...]
Looks like a more sophisticated version of https://www.winehq.org/~jwhite/2deb8c2825af.html, which is definitely a nice resource when I'm trying to put effort into fixing test failures.
Right. I should probably have mentioned this bug which says Jer's page was part of the inspiration. But that page did not do what I need so I tweaked it.
https://bugs.winehq.org/show_bug.cgi?id=48164
Oh. And now the official pages are online and getting more feature complete.
https://test.winehq.org/data/patterns-tb-win.html https://test.winehq.org/data/patterns-tb-wine.html
I guess the tests are color-coded by number of failures, modulo some constant?
Right. Each failure type (timeout, crash, etc) has its own color. And then I use a gradient to attribute a color to each 'vanilla' failure count.
Note that what counts for allocating the colors is not the actual failure counts, but the number of different failure counts. That is a test with 4, 5 or 6 failures will get the same colors as one with 1, 2 or 100 failures because in both cases there are only 3 different values.
I'll add a description of the patterns on the pages at some point.
I like the idea. I will note though that some of those colours seem hard to tell apart, e.g. the shades of green in wine d3d9:device.
Yes. When a test unit has 30 different failure counts it's hard to find enough easy to distinguish colors. It's probably possible to do better by tweaking the colors the gradient goes through.
https://source.winehq.org/git/tools.git/blob/HEAD:/winetest/build-patterns#l...
The cyan-green-yellow part of the gradient produces colors that are not very easy to distinguish. The colors in the yellow-red part seem easier to identify but that gradient is given the same weight as the other two. I've experimented a bit with a darker cyan but going too dark does not look very nice.
Also I guess they aren't consistent across tests for some reason?
The goal is to maximize the contrast in the colors used by each pattern. But if I used a single 'color map' for all test units, I would need to allocate a hundred different colors. Then many test units with just a few failures would end up only using very similar colors.
Allocating one color map per test unit limits this issue to just a few patterns. And the best fix would be to reduce the number of failures in these tests ;-)
I'll admit I don't fully follow your logic.
I guess if it were me, I'd use a fixed colormap of a small fixed number (16? I'm guessing there) of colors that are easy to distinguish, and then universally assign colors by (n % 16). I'd also pick out those colors manually instead of trying to generate them. Yeah, you won't be able to distinguish between 1 failure and 17 failures, but hopefully that contrast won't come up very much. Plus, that way, you could even learn a mental association, I guess, for whatever that's worth.
Or you could assign 1-16 to individual colors and anthing greater than 16 to another color. Of course many tests have very large numbers of failures (usually repeated of course).
That's kind of splitting hairs of course.
Now, another thing that occurs to me that would be very useful, and which doesn't necessarily preclude any of the above but does sort of obviate its usefulness, is to generate a list of failures by line, or even by line + failure message. I'd envision this as one per row, with "X" flags on each machine + day that displays it. Of course I'm sure you already have plenty of ideas on expanding the page; I'm just throwing out one of my own here.