The current implementation enqueues the vkQueuePresentKHR() command after waiting on a fence which was enqueued where this command should go. However, this occurs in a worker thread to avoid deadlock, so an arbitrary number of additional commands may be enqueued before vkQueuePresentKHR(). I haven't found a specific scenario where this is a problem, but it is not desirable if we can avoid it.
This new implementation is just an RFC; it won't build because the vkd3d function is missing. That can be found [here](https://gitlab.winehq.org/cmccarthy/vkd3d/-/blob/present/libs/vkd3d/c… but I haven't raised an MR. The DXGI side meets all requirements: vkAcquireNextImageKHR() blocks there, but the driver unblocks it so deadlock is not possible. The vkd3d function waits only for the command queue op mutex and cannot deadlock either. The overall implementation should be less fragile, and doesn't need a worker thread.
A few days ago I saw a performance increase in HZD from reverting DXGI to before the worker was added, but have never seen this again. Performance differences are minimal between old, current and this one.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/5851
On Fri Jun 21 05:51:22 2024 +0000, Conor McCarthy wrote:
> changed this line in [version 2 of the diff](/wine/wine/-/merge_requests/5830/diffs?diff_id=118898&start_sha=0a098294aa1607350d8e382416546fe945fac98b#b8e9f5f67cc13143a0928d1e76d7519d0bab1bc3_1773_1764)
I removed this from the MR as I think it should be separate. We need one semaphore per Vulkan image. Indexing the semaphore array is an issue because we have no `vk_image_index`, but `frame_number % swapchain->buffer_count` may be ok.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/5830#note_73880
On Fri Jun 21 05:51:23 2024 +0000, Conor McCarthy wrote:
> changed this line in [version 2 of the diff](/wine/wine/-/merge_requests/5830/diffs?diff_id=118898&start_sha=0a098294aa1607350d8e382416546fe945fac98b#b8e9f5f67cc13143a0928d1e76d7519d0bab1bc3_1282_1269)
I replaced this with a wait for the blit before returning from `d3d12_swapchain_present()`. After a call to Present() returns, the client can render to the next buffer in sequence, so we must wait for completion of the blit read of that buffer. The old version using `vkAcquireNextImageKHR()` is not strict enough. HZD with vsync has frame pacing issues in some parts of the benchmark even on the old implementation, and this makes it a little bit worse. We can't really avoid that though, and the best fix is to improve frame times in vkd3d.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/5830#note_73879
On Mon Jun 17 17:07:40 2024 +0000, Elizabeth Figura wrote:
> Without doing any research myself... the other side is going to try to
> enter the logic immediately [within SleepConditionVariableCS()] assuming
> it's waiting, and I think usually wakes are fast enough [at least on
> Linux] that the other thread will generally start executing before you
> even return from the wake function.
> Besides a poorly communicated warning that doesn't apply here, the first
> comment in [1] also claims that it defeats a "wait morphing"
> optimization, where instead of actually waking threads they're just
> moved from the CV waitqueue to the mutex wait queue [if their associated
> mutex is locked [by the calling thread?]]. That's not something we
> currently implement in Wine; it may be possible to implement later though...
> [1] https://stackoverflow.com/questions/52503361/unlock-the-mutex-after-conditi…
I do see a small but measurable performance gain in HZD. It's only 0.7%, but that shoud be weighed that against the very simple change needed to gain it.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/5830#note_73878