The current implementation enqueues the vkQueuePresentKHR() command after waiting on a fence which was enqueued where this command should go. However, this occurs in a worker thread to avoid deadlock, so an arbitrary number of additional commands may be enqueued before vkQueuePresentKHR(). I haven't found a specific scenario where this is a problem, but it is not desirable if we can avoid it.
This new implementation is just an RFC; it won't build because the vkd3d function is missing. That can be found [here](https://gitlab.winehq.org/cmccarthy/vkd3d/-/blob/present/libs/vkd3d/command....) but I haven't raised an MR. The DXGI side meets all requirements: vkAcquireNextImageKHR() blocks there, but the driver unblocks it so deadlock is not possible. The vkd3d function waits only for the command queue op mutex and cannot deadlock either. The overall implementation should be less fragile, and doesn't need a worker thread.
A few days ago I saw a performance increase in HZD from reverting DXGI to before the worker was added, but have never seen this again. Performance differences are minimal between old, current and this one.