The current implementation enqueues the vkQueuePresentKHR() command after waiting on a fence which was enqueued where this command should go. However, this occurs in a worker thread to avoid deadlock, so an arbitrary number of additional commands may be enqueued before vkQueuePresentKHR(). I haven't found a specific scenario where this is a problem, but it is not desirable if we can avoid it.
This new implementation is just an RFC; it won't build because the vkd3d function is missing. That can be found [here](https://gitlab.winehq.org/cmccarthy/vkd3d/-/blob/present/libs/vkd3d/c… but I haven't raised an MR. The DXGI side meets all requirements: vkAcquireNextImageKHR() blocks there, but the driver unblocks it so deadlock is not possible. The vkd3d function waits only for the command queue op mutex and cannot deadlock either. The overall implementation should be less fragile, and doesn't need a worker thread.
A few days ago I saw a performance increase in HZD from reverting DXGI to before the worker was added, but have never seen this again. Performance differences are minimal between old, current and this one.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/5851
On Fri Jun 21 05:51:22 2024 +0000, Conor McCarthy wrote:
> changed this line in [version 2 of the diff](/wine/wine/-/merge_requests/5830/diffs?diff_id=118898&start_sha=0a098294aa1607350d8e382416546fe945fac98b#b8e9f5f67cc13143a0928d1e76d7519d0bab1bc3_1773_1764)
I removed this from the MR as I think it should be separate. We need one semaphore per Vulkan image. Indexing the semaphore array is an issue because we have no `vk_image_index`, but `frame_number % swapchain->buffer_count` may be ok.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/5830#note_73880
On Fri Jun 21 05:51:23 2024 +0000, Conor McCarthy wrote:
> changed this line in [version 2 of the diff](/wine/wine/-/merge_requests/5830/diffs?diff_id=118898&start_sha=0a098294aa1607350d8e382416546fe945fac98b#b8e9f5f67cc13143a0928d1e76d7519d0bab1bc3_1282_1269)
I replaced this with a wait for the blit before returning from `d3d12_swapchain_present()`. After a call to Present() returns, the client can render to the next buffer in sequence, so we must wait for completion of the blit read of that buffer. The old version using `vkAcquireNextImageKHR()` is not strict enough. HZD with vsync has frame pacing issues in some parts of the benchmark even on the old implementation, and this makes it a little bit worse. We can't really avoid that though, and the best fix is to improve frame times in vkd3d.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/5830#note_73879
On Mon Jun 17 17:07:40 2024 +0000, Elizabeth Figura wrote:
> Without doing any research myself... the other side is going to try to
> enter the logic immediately [within SleepConditionVariableCS()] assuming
> it's waiting, and I think usually wakes are fast enough [at least on
> Linux] that the other thread will generally start executing before you
> even return from the wake function.
> Besides a poorly communicated warning that doesn't apply here, the first
> comment in [1] also claims that it defeats a "wait morphing"
> optimization, where instead of actually waking threads they're just
> moved from the CV waitqueue to the mutex wait queue [if their associated
> mutex is locked [by the calling thread?]]. That's not something we
> currently implement in Wine; it may be possible to implement later though...
> [1] https://stackoverflow.com/questions/52503361/unlock-the-mutex-after-conditi…
I do see a small but measurable performance gain in HZD. It's only 0.7%, but that shoud be weighed that against the very simple change needed to gain it.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/5830#note_73878
The last commit fixes glitches in Horizon Zero Dawn with vsync on, for reasons stated in the comment there.
The wined3d implementation already blocks the client's call to `Present()` when `vkAcquireNextImageKHR()` blocks, so is roughly equivalent to this when frame latency and buffer count are equal. It has a frame latency implementation which unblocks after calling `vkQueuePresentKHR()`, rather than waiting for its execution to complete. I think the d3d12 implementation is supposed to do the latter, though using `KHR_present_wait` may be unnecessary overkill. Waiting on queue execution comes with a performance cost either way, so without a compelling reason for a strict implementation, something similar to wined3d makes sense.
Another potential issue is after the call to `ID3D12CommandQueue_Signal()` in `d3d12_swapchain_present()`, the client is free to enqueue more commands when `Present()` returns, so by the time the signal is executed and the swapchain worker blits the image and calls `vkQueuePresentKHR()`, these commands are not necessarily enqueued immediately after the `Signal()`. This may not always be harmless.
--
v2: dxgi: Wait for completion of a pending read of the next buffer before returning from d3d12_swapchain_Present().
dxgi: Wake the condition variable after leaving the critical section in d3d12_swapchain_resize_buffers().
dxgi: Wake the condition variable after leaving the critical section in d3d12_swapchain_present().
https://gitlab.winehq.org/wine/wine/-/merge_requests/5830
Huh. Okay.
I've seen some of those discussions, and something in them did indeed feel strange. (I never understood the context well enough to determine exactly what, and it's not my place to question that, anyways.)
I'll miss you 💔
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/5886#note_73870