Bigger problem is that existing tests are already unstable. One thing is that running offline breaks in test_scheme_resolvers(), on Windows, that's easy to skip. But then I get sporadic failures in test_media_session_source_shutdown() when test is waiting forever, even if tests don't hang or crash I get different number of executed tests each time I run the test. I can understand if something async occasionally works differently, but executed test numbers are fluctuating in order of thousands.
I think we need to make this stable first, before adding more. Latest Windows 11 release gives even worse results.