Looking over the results on test.winehq.org, the user32 menu tests are the worst in terms of failure count (though I suppose crashing tests are worse): they fail over 1000 times on NT4, over 600 times on Win9x, and over 300 times on Vista. The only operating systems on which they seem to succeed are Windows 2000 and XP, and they never succeeded for me on Windows XP, so I'm not sure the succeeding results are even representative.
Basically, they're the worst offender in terms of reliable results. Is there anyone who knows anything about them that can take a look at the test failures? --Juan
P.S. The crypt32 encode tests are the next-worst offender. I'm working on 'em.