Yeah, in general I'd prefer creating fewer threads rather than more, unless it either can't be avoided or there a clear advantage to creating more threads. In fact, I wonder how hard it would be to use vkd3d_fence_worker_main() for this. Waiting for fences is a blocking operation, but it may not have to be, and in principle these waits are expected to complete quickly. That also depends on which issue we're trying to address here, of course...
As it was probably evident from my comment, I'm also in favor of using as fewer threads as possible. Still, I'm not sure that reusing the fence worker would be a good idea, since the fence worker has to wait on either a timeline semaphore or a fence.
Sure, but those waits don't necessarily need to be blocking waits. It may still be a bad idea, of course, depending on which issue we're trying to address, but I'd like for some serious thought to be given to whether we can make it work.