It might be better to use something like broken() to encourage more precision in marking the failures. For example, if something fails with ACCESS_DENIED sometimes, we'd still like to know if it starts failing with FILE_NOT_FOUND. That does make it trickier to add flaky tests to the summary line, though. I guess you could set it up so that if flaky() is called with non-zero and the next ok() succeeds, it's marked as flaky.
Having the failures in the summary line also opens up the option of handling them by retrying, and can be implemented without making that decision first. Of course, retrying leaves open the possibility that you will be unlucky and all the retries will also fail. Whether that's worth it would probably depend on how often a regression on a flaky test slips through and causes the tests to fail for Alexandre.