I would suggest looking at the testbot output of debian11 here: https://gitlab.winehq.org/wine/wine/-/merge_requests/3492#note_41119
This isn't perfect (the child WM_SETFOCUS was incorrectly matched with a parent WM_SETFOCUS), but it still makes it very easy to see the list of "extra" messages in the sequence. This is a vast improvement over the current system which is practically guaranteed to desync after a single mismatch, and it would save me a lot of time analyzing test failures. (I will also point out: I had no way of knowing it would fail in this way, and it still worked even though I didn't get to choose the example.)
I don't think a minimal diff is worth the complexity it would add to the code, because in most cases the actual sequence should only differ by the addition or removal of a single contiguous block of messages. Anyway, I'm not willing to do that work. If a heuristic is not acceptable, then we should stop checking after a single mismatch and just print the actual sequence (without a side-by-side comparison, I don't see the value in printing the expected sequence).