https://bugs.winehq.org/show_bug.cgi?id=48166
--- Comment #1 from François Gouget fgouget@codeweavers.com --- The proposed merge algorithm does not quite work.
The problem is that when looking at the diff we don't know if the '-' lines come before or after the '+' ones, or if they are interleaved.
To simplify things, assume the failure messages are a simple digit. Then if we have rep1 = [1 2 3 4] and rep2 = [5] we don't know if the merge should be: rep1+2 = [1 2 3 5] or = [1 2 5 3] or = [1 5 2 3] or = [5 1 2 3]
The impact is on later merges: if rep1+2 = [1 2 3 5] and rep3 = [5 3], then the diff will give us: -1 -2 -3 5 +3 and the merge would be something like [1 2 3 5 3].
So now failure 3 is in two places, making it appear as if those were two separate failures. Future merges will match either one or the other, so that the analysis will get an incomplete picture of when and where the failure happened.
Having more context could help avoid these issues, so long as the extra context does not itself change. So that may not work great.
Another approach would be to consider that failure messages are unique. But that assumption is not really all that true: egrep '(Test failed|Test succeeded)' report | sed -e 's/^([a-z0-9.]*):[0-9]*:/\1:/' | sort | uniq -c | sort -n | egrep -v '^ *1 '