The issue is, to prevent concurrent writes of the same Vulkan descriptor we must delay writing them until command list submission, so instead of descriptors being written on the fly, often by multiple threads, we write them all at the last millisecond from a single thread. In my testing of HZD, the worker thread always handled all or very nearly all writes by list submission time, so it removes this bottleneck.
I think the fence worker is unsuitable. If it has some fences to wait for, it must poll `vkWaitSemaphoresKHR()` with a zero or extremely short wait time in case some descriptor writes come along. To avoid spinning we would need to use a short wait time, not zero, which will delay descriptor handling. And when writes do occur, they may delay fence handling.
Two or four threads in `struct d3d12_device` looks to me like the best option. Or we could make it dynamic, so another thread is created if too many writes remain when command lists are submitted, at a cost of greater complexity.