broken() would be used if a given Windows version was systematically returning a different value from what we are expecting. But most of the time Windows returns the value we expect. So from that point of view flaky() would be appropriate.
However in fact the test does not fail "randomly". instead it systematically fails on Wednesdays. So flagging it as "flaky" may going beyond the intended scope for flaky. But then that scope was never really defined. So this is the perfect opportunity for defining those boundaries...
Your other options are: * Fixing the test. * Putting it in an if (0). * Removing it.