On Fri Jun 21 06:04:13 2024 +0000, Conor McCarthy wrote:
I replaced this with a wait for the blit before returning from `d3d12_swapchain_present()`. After a call to Present() returns, the client can render to the next buffer in sequence, so we must wait for completion of the blit read of that buffer. The old version using `vkAcquireNextImageKHR()` is not strict enough. HZD with vsync has frame pacing issues in some parts of the benchmark even on the old implementation, and this makes it a little bit worse. We can't really avoid that though, and the best fix is to improve frame times in vkd3d.
Sorry it took so much time to react, but I don't think that's correct, at least not in general.
I did some research on how maximum frame latency and `Present()` timing is supposed to work and this is what I could gather so far: * I'll assume that we're using a flip presentation model (the only one that is allowed for D3D12; the blt model is restricted to D3D11 and previous) and assume that `Present()` is always called with sync interval 1. * Partially repealing an earlier comment by me, it doesn't seem that `Present()` cares at all about `BufferCount`. The swapchain images are basically treated like a ring buffer of `BufferCount` elements. Each time `Present()` is called a presentation operation is queued; each time a presentation operation is dequeued (following the timing expressed by the sync interval), the next image is selected from the ring buffer (adding one to the read counter and wrapping it around) and presented. If you didn't synchronize correctly and write on an image before it is presented (or even while it is presented), bad for you, but the presentation engine doesn't care. You'll probably miss frames or have tearing. * So your only hope to not step on the presentation engine's feet is to make judicious use of frame latency commands. There the swapchain behaves differently depending of whether it was created with `DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT` or not. * The legacy behavior is without `DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT`. In this case the swapchain automatically manages the frame latency waits, so the application can just relay on `Present()` waiting appropriately. The maximum latency has to be set with `IDXGIDevice1::SetMaximumFrameLatency()`, but `IDXGIDevice` is not available for D3D12 devices, as far as I can tell; so we can only keep the default value, which is 3 in the docs (and that value aligns with my timing observations). In practice not having `DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT` isn't a lot different than having it: the only difference is that at the end of each `Present()` call a wait on the latency waitable is done by D3D12 on behalf of the client (other minor differences is that you can't retrive the waitable itself or change the latency number). * Instead, if the swapchain is created with `DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT` then some cooperation from the application is expected: the application has to retrieve the waitable object (which in practice behaves like a semaphore, even if the docs never seem to explicitly mention that). This is already tested in `test_frame_latency_event()`, even if the waitable is not an event: the semaphore starts at the default maximum latency (which is 1) and is released each time a frame is presented (for real, not when `Present()` is called). When `IDXGISwapChain2::SetMaximumFrameLatency()` and the new value is larger than the previous one, then the semaphore is released a number of times equal to the difference between the new and old value. If the new value is smaller, the semaphore is not touched (i.e., `SetMaximumFrameLatency()` never waits). The application is supposed to wait on the semaphore before starting rendering; if it doesn't, `Present()` will happily keep queuei ng presentation operations even if the ring buffer overflows and/or the set maximum latency is exceeded.
I wrote an alternative implementation which should fix the same problem. It's in !7152 (it replaces only the last commit here, the first three make sense on their own).