Trying to do this based on time is very annoyingly hard.
* Testing the lower 32 bits of the interrupt time / tick count is fast enough, but the resolution isn't good enough. Increasing the resolution to 1 ms (i.e. reducing the timeout in the server) in the occasion that an application calls timeBeginPeriod() is feasible, but we don't want to do it unless we need to, and we don't really know that ahead of time in d3d (e.g. we don't know whether we're in an game that's trying its best to make 60 fps, or a productivity application that shouldn't consume any more CPU than necessary).
* Trying to use a separate thread is a large amount of code to begin with, which isn't great. More concerningly, we need it to synchronize with the CS thread, and even the overhead of using a mutex hurts my artificial benchmark in a noticeable way.
* I briefly examined the idea of using a separate thread but submitting *to* the CS thread, basically as a client thread. The problem is that we can be way ahead of the CS thread, and we have no idea how long it's actually been since the last submission.
So I'm giving up on that approach for now, and just counting draw and dispatch calls instead. This is of course cheap to measure, and should ultimately work just as well. It may result in submitting *too* often in case the application makes a *lot* of draw calls, but as stated in the commit message, we should prevent that by limiting the number of inflight command buffers, and anyway if vkQueueSubmit() itself does clearly become a bottleneck, we can offload it to a separate thread [synchronization there is a lot easier if that thread doesn't also have to decide to *end* a command buffer that the CS is currently using.]