I don't 100% understand why we need the per swapchain threads, is it because there's no asynchronous APIs in Vulkan for all the needed waits that we could use single threadedly? (whether from a vkd3d background thread, if there's one, or a dedicated thread) That was my takeaway from our conversation, but I'm not certain I got it right.
Is allocating the ops on the heap a potential performance concern? I recall removing allocations in wined3d and using separate heaps because the global heap lock was getting pretty contended with some games. Ideally it would be just a flat array queue, but if there's only a couple ops per frame and we don't care that much, a low effort thing to do would be to keep a free list.