So, one of the things one learns when writing a patch robot is that flaky tests are very annoying.
Each time it gets a new git tree, the robot does five baseline "make -k test" runs, remembers the tests that fail, and doesn't penalize patches for failing any of those tests. See http://code.google.com/p/winezeug/source/browse/trunk/patchwatcher/patchwatc...
Annoyingly, that's not enough. Some tests stubbornly refuse to fail during the baseline test runs. So I added a second, manual blacklist for those tests; see http://code.google.com/p/winezeug/source/browse/trunk/patchwatcher/patchwatc... The list is currently user32:msg.c user32:input.c d3d9:visual.c ddraw:visual.c urlmon:protocol.c kernel32:thread.c and will continue growing as I keep plugging away at getting the patch robot happy.
Is anybody else seeing this kind of flakiness? (If you're not, try running patchwatcher for a while :-)
FWIW, I'm running the tests on hardy with a fresh metacity (as described in http://wiki.winehq.org/MakeTestFailures ).