"Flaky" is effectively equivalent to "skip" except that the test itself is always run.
Ideally we could have ~~testbot~~ CI detect noticeable changes in success/failure frequency over the last _N_ test runs, and flag such occurences as unexpected success/failure. Otherwise, tests marked flaky risk going stale; this applies the same for skipped tests.