I don't 100% understand why we need the per swapchain threads, is it because there's no asynchronous APIs in Vulkan for all the needed waits that we could use single threadedly? (whether from a vkd3d background thread, if there's one, or a dedicated thread) That was my takeaway from our conversation, but I'm not certain I got it right.
Yeah, that's the reason. Vulkan has `VkFence` and `vkWaitForFences()` to wait on the first one which becomes signaled, but unfortunately not all Vulkan operations that can wait expose the end of the wait with a `VkFence`. Specifically, `vkAcquireNextImageKHR()` doesn't (it *also* takes a fence to signal, but the call itself might wait). So you can't use a thread for more than one swapchain, because you don't know which one will be the first to have an available image.
`vkWaitForPresentKHR()` has a similar problem. We don't use it yet, but we'll probably want to in the future.
This looks like an oversight in the Vulkan design. Extension `VK_KHR_present_wait` [offers some criteria for that choice in its "Issues" section](https://registry.khronos.org/vulkan/specs/1.3-extensions/html/chap54.html#_i...). At any rate, we're bound to what the specs allow us.
Is allocating the ops on the heap a potential performance concern? I recall removing allocations in wined3d and using separate heaps because the global heap lock was getting pretty contended with some games. Ideally it would be just a flat array queue, but if there's only a couple ops per frame and we don't care that much, a low effort thing to do would be to keep a free list.
Yeah, I expect that in normal cirumstances this only requires on allocation per frame, which I guess should be relatively easy to handle (I expect wined3d to be potentially much more stressed). My feeling is that this shouldn't be a problem, and that it can be addressed later if it shows up.