On Mon Jun 17 11:12:09 2024 +0000, Giovanni Mascellani wrote:
I might be misunderstanding what this is supposed to do, but it doesn't look particularly solid. It seems that you want to somehow delay some `Present()` to return to the caller, so that the caller itself doesn't assume that some buffer is valid, but if that's the problem simply blocking while acquiring a Vulkan image is not going to help. There is not other synchronization between the queue producer and consumer, so as soon as the caller calls `Present()` a little bit in advance (while the queue is still processing the earlier present request) it can enqueue all the `Present()`'s it wants. Therefore the fact that `Present()` blocks or not just depends on some timing detail, and that doesn't look like a solution to anything. More in general, my understanding of the philosophy behind the D3D12 swapchain is that we want to keep a fair amount of separation between the D3D12 and Vulkan swapchain images, because the rules governing them make it hard to do otherwise. That's unfortunate performance-wise, because it requires an additional blit for each presented frame, but it enables us to (potentially, there are surely bugs around) offer the same behavior that is seen on Windows. For example AFAICT it's totally fine, even if unusual, to request a swapchain with 10 buffers (I seem to recall that Windows will start complaining at 16, but don't quote me on that). Even if our Vulkan has 3 images, we still can emulate the 10 D3D12 buffers, because we keep not relationship between the two sets. The application will happily paint to the D3D12 buffer, then when one of them has to be presented we'll only then select a Vulkan image to blit the D3D12 buffer to and then present it. So there must be no link between the availability of Vulkan and D3D12 buffers/images. If we want to implement a certain behavioral element of the D3D12 swapchain we should do it without depending on how the Vulkan swapchain works.
I probably need to improve the comment. With vsync on, the driver delays present commands until the next vblank, which can easily cause presents to accumulate until all three images are pending, at which point `vkAcquireNextImageKHR()` will block. But because the block occurs in a worker, our `Present()` implementation will happily add more ops to the list without limit. `Present()` must block for the same reason `vkAcquireNextImageKHR()` blocks - to wait until an earlier present meets a vblank and completes. This does not solve the problem of frame latency, which should be patched separately.
If rendering is very fast compared to sync rate, it's theoretically possible for any number of presents to be added to the list while the worker leaves the mutex unlocked. A frame latency implementation is probably the answer to that issue.