On Sun, 2 May 2021, Zebediah Figura (she/her) wrote: [...]
I guess if it were me, I'd use a fixed colormap of a small fixed number (16? I'm guessing there) of colors that are easy to distinguish, and then universally assign colors by (n % 16).
16 colors is not enough, particularly not if using a single palette for all the tests.
For instance the record holder is user32:clipboard with 81 different failure counts: https://test.winehq.org/data/patterns.html#user32:clipboard
So with a 16 color palette there would be a lot of wrapping and that would likely make the pattern unreadable.
Also note that even with the current scheme one can clearly see that the cw-rx460 machine has more failures than the other test configurations. Partly because it's almost the only machine present in that pattern, and partly because it has more yellow/red which are the colors of higher failure counts.
In contrast the non-English w10pro64 VMs have fewer failures (blue) and all the same color (and hence count). This suggests they have a different cause. (for cw-rx460 it's the Radeon driver, I have not looked at w10pro64 yet)
user32:input is another case where the current color scheme works pretty well despite the high number of different failure counts (31).
https://test.winehq.org/data/patterns.html#user32:input
(And it shows something pretty bad happened on cw-gtx560-1909 around Apr 2nd. Now I just have to figure out what)
For reference, here are the 'high scores': 81 user32:clipboard 41 user32:win 31 user32:input 27 ole32:clipboard 26 d3d11:d3d11 25 user32:msg 21 user32:sysparams 20 d3d10core:d3d10core
I'd also pick out those colors manually instead of trying to generate them.
I'm fine with someone picking the colors manually but I'm not an artist and agonising over each color is not going to be a time saver for me.
Yeah, you won't be able to distinguish between 1 failure and 17 failures, but hopefully that contrast won't come up very much.
Distinguishing between 1 and 17 failures is super important: it's the difference between catching a commit that introduces 16 news failures in the days after it's committed, and letting it slip through the cracks, only to be rediscovered months later when the author has vanished.
Plus, that way, you could even learn a mental association, I guess, for whatever that's worth.
Precisely: what is it worth? What advantage does being able to identify at a glance that two test units have the same number of failures gain us?
[...]
Now, another thing that occurs to me that would be very useful, and which doesn't necessarily preclude any of the above but does sort of obviate its usefulness, is to generate a list of failures by line, or even by line + failure message.
Line numbers are useless for tracking failures: they change almost every time a test is modified. Matching on the message may work better, though some have 'random' content (pointers, etc). But fortunately they are relatively rare.
I'd envision this as one per row, with "X" flags on each machine + day that displays it.
Web pages are two-dimensional. So if rows are failure messages that only leaves columns to show both the reports and builds. That feels like one too many.
Or maybe instead of one box per test unit you meant to have one per failure message? That's likely going to be many boxes (there's already 327 test units that had failures in the past 2 months!!!).
I had a possibly related idea for tracking individual failures but I'm not entirely sure it would work in practice: https://bugs.winehq.org/show_bug.cgi?id=48166
Of course I'm sure you already have plenty of ideas on expanding the page; I'm just throwing out one of my own here.
Not that many actually. * Adding some sort of documentation.
* Adding links to potentially related Git commits.
* Adding a global pattern based on the number of failed test units. (that one also highlights the cw-rx460 issues pretty well)
* Adding links to related bugs. But ideally that would use Bugzilla's rest API which is only available in Bugzilla >= 5.0 (WineHQ still runs 4.4.13, I don't know if it's worth upgrading Bugzilla just for this).
That's about it.