This is an alternative proposal to !5830. It fixes the same glitches in Horizon Zero Dawn, but attempts to do that by implementing the proper behavior expected from the swapchain. More precisely, it fixes a few mistakes I had made the last time I touched this code, it implements the frame latency waitable with a semaphore instead of an event (as it should be) and at last it uses the frame latency waitable to implement an appropriate wait after presentation when the client didn't request to manage the waitable directly.
This implementation is not yet complete, because the semaphore should be released when the frame is presented. Knowing when this is the case would require using `VK_KHR_present_wait`. We currently approximate that with the moment in which the frame is submitted for presentation to Vulkan, which as far as I can tell is good enough. Eventually I'll try to write the proper thing.
This MR depends on https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/1343, which has been submitted in the meantime to vkd3d. The vkd3d change is included in the first commit, so this MR is already functional.
From: Giovanni Mascellani gmascellani@codeweavers.com
The new vkd3d API allows submitting presentation as soon as the internal vkd3d work is flushed to Vulkan. Now we're also waiting for it to be executed, which is useless. --- dlls/dxgi/swapchain.c | 2 +- dlls/wined3d/wined3d.spec | 1 + libs/vkd3d/include/vkd3d.h | 29 ++++++++++++++++++--- libs/vkd3d/libs/vkd3d/command.c | 37 +++++++++++++++++++++++++++ libs/vkd3d/libs/vkd3d/vkd3d_private.h | 1 + 5 files changed, 66 insertions(+), 4 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 4797fcecd8d..b1ce578e8be 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -2363,7 +2363,7 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, } }
- if (FAILED(hr = ID3D12CommandQueue_Signal(swapchain->command_queue, + if (FAILED(hr = vkd3d_signal_fence_on_cpu(swapchain->command_queue, swapchain->present_fence, swapchain->frame_number))) { ERR("Failed to signal present fence, hf %#lx.\n", hr); diff --git a/dlls/wined3d/wined3d.spec b/dlls/wined3d/wined3d.spec index 151bcaf9751..eb352d144db 100644 --- a/dlls/wined3d/wined3d.spec +++ b/dlls/wined3d/wined3d.spec @@ -360,6 +360,7 @@ @ cdecl vkd3d_resource_incref(ptr) @ cdecl vkd3d_serialize_root_signature(ptr long ptr ptr) @ cdecl vkd3d_serialize_versioned_root_signature(ptr ptr ptr) +@ cdecl vkd3d_signal_fence_on_cpu(ptr ptr long)
@ cdecl vkd3d_shader_compile(ptr ptr ptr) @ cdecl vkd3d_shader_convert_root_signature(ptr long ptr) diff --git a/libs/vkd3d/include/vkd3d.h b/libs/vkd3d/include/vkd3d.h index b18fd14f4c3..59516f4af80 100644 --- a/libs/vkd3d/include/vkd3d.h +++ b/libs/vkd3d/include/vkd3d.h @@ -411,9 +411,13 @@ VKD3D_API uint32_t vkd3d_get_vk_queue_family_index(ID3D12CommandQueue *queue); * the Vulkan driver as being submitted before other work submitted * though the Direct3D 12 API. If this is not desired, it is * recommended to synchronize work submission using an ID3D12Fence - * object, by submitting to the queue a signal operation after all the - * Direct3D 12 work is submitted and waiting for it before calling - * vkd3d_acquire_vk_queue(). + * object: + * 1. submit work through the Direct3D 12 API; + * 2. call vkd3d_signal_fence_on_cpu(); + * 3. wait for the fence to be signalled; + * 4. call vkd3d_acquire_vk_queue(); it is guaranteed that all work submitted + * at point 1 has already been submitted to Vulkan (though not necessarily + * executed). * * \since 1.0 */ @@ -466,6 +470,21 @@ VKD3D_API HRESULT vkd3d_create_versioned_root_signature_deserializer(const void */ VKD3D_API void vkd3d_set_log_callback(PFN_vkd3d_log callback);
+/** + * Signal a fence on the CPU once all the currently outstanding queue work is + * submitted to Vulkan. + * + * The fence will be signalled on the CPU (as if ID3D12Fence_Signal() was + * called) once all the work submitted through the Direct3D 12 API before + * vkd3d_signal_fence_on_cpu() is called has left the internal queue and has + * been submitted to the underlying Vulkan queue. Read the documentation for + * vkd3d_acquire_vk_queue() for more details. + * + * \since 1.15 + */ +VKD3D_API enum vkd3d_result vkd3d_signal_fence_on_cpu(ID3D12CommandQueue *queue, + ID3D12Fence *fence, uint64_t value); + #endif /* VKD3D_NO_PROTOTYPES */
/* @@ -512,6 +531,10 @@ typedef HRESULT (*PFN_vkd3d_create_versioned_root_signature_deserializer)(const /** Type of vkd3d_set_log_callback(). \since 1.4 */ typedef void (*PFN_vkd3d_set_log_callback)(PFN_vkd3d_log callback);
+/** Type of vkd3d_signal_fence_on_cpu(). \since 1.15 */ +typedef enum vkd3d_result (*PFN_vkd3d_signal_fence_on_cpu)(ID3D12CommandQueue *queue, + ID3D12Fence *fence, uint64_t value); + #ifdef __cplusplus } #endif /* __cplusplus */ diff --git a/libs/vkd3d/libs/vkd3d/command.c b/libs/vkd3d/libs/vkd3d/command.c index a55a97f6f2f..ed7c2aa8677 100644 --- a/libs/vkd3d/libs/vkd3d/command.c +++ b/libs/vkd3d/libs/vkd3d/command.c @@ -6345,6 +6345,7 @@ static void d3d12_command_queue_destroy_op(struct vkd3d_cs_op_data *op) break;
case VKD3D_CS_OP_SIGNAL: + case VKD3D_CS_OP_SIGNAL_ON_CPU: d3d12_fence_decref(op->u.signal.fence); break;
@@ -7335,6 +7336,7 @@ static HRESULT d3d12_command_queue_flush_ops_locked(struct d3d12_command_queue * struct vkd3d_cs_op_data *op; struct d3d12_fence *fence; unsigned int i; + HRESULT hr;
queue->is_flushing = true;
@@ -7368,6 +7370,11 @@ static HRESULT d3d12_command_queue_flush_ops_locked(struct d3d12_command_queue * d3d12_command_queue_signal(queue, op->u.signal.fence, op->u.signal.value); break;
+ case VKD3D_CS_OP_SIGNAL_ON_CPU: + if (FAILED(hr = d3d12_fence_Signal(&op->u.signal.fence->ID3D12Fence1_iface, op->u.signal.value))) + ERR("Failed to signal fence %p, hr %s.\n", op->u.signal.fence, debugstr_hresult(hr)); + break; + case VKD3D_CS_OP_EXECUTE: d3d12_command_queue_execute(queue, op->u.execute.buffers, op->u.execute.buffer_count); break; @@ -7510,6 +7517,36 @@ void vkd3d_release_vk_queue(ID3D12CommandQueue *queue) return vkd3d_queue_release(d3d12_queue->vkd3d_queue); }
+enum vkd3d_result vkd3d_signal_fence_on_cpu(ID3D12CommandQueue *iface, ID3D12Fence *fence_iface, uint64_t value) +{ + struct d3d12_command_queue *command_queue = impl_from_ID3D12CommandQueue(iface); + struct d3d12_fence *fence = unsafe_impl_from_ID3D12Fence(fence_iface); + struct vkd3d_cs_op_data *op; + HRESULT hr = S_OK; + + TRACE("iface %p, fence %p, value %#"PRIx64".\n", iface, fence_iface, value); + + vkd3d_mutex_lock(&command_queue->op_mutex); + + if (!(op = d3d12_command_queue_op_array_require_space(&command_queue->op_queue))) + { + ERR("Failed to add op.\n"); + hr = E_OUTOFMEMORY; + goto done; + } + op->opcode = VKD3D_CS_OP_SIGNAL_ON_CPU; + op->u.signal.fence = fence; + op->u.signal.value = value; + + d3d12_fence_incref(fence); + + d3d12_command_queue_submit_locked(command_queue); + +done: + vkd3d_mutex_unlock(&command_queue->op_mutex); + return hr; +} + /* ID3D12CommandSignature */ static inline struct d3d12_command_signature *impl_from_ID3D12CommandSignature(ID3D12CommandSignature *iface) { diff --git a/libs/vkd3d/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/libs/vkd3d/vkd3d_private.h index 97a99782d6a..fde6e9e9d4e 100644 --- a/libs/vkd3d/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/libs/vkd3d/vkd3d_private.h @@ -1324,6 +1324,7 @@ enum vkd3d_cs_op { VKD3D_CS_OP_WAIT, VKD3D_CS_OP_SIGNAL, + VKD3D_CS_OP_SIGNAL_ON_CPU, VKD3D_CS_OP_EXECUTE, VKD3D_CS_OP_UPDATE_MAPPINGS, VKD3D_CS_OP_COPY_MAPPINGS,
From: Giovanni Mascellani gmascellani@codeweavers.com
This is only partially correct, because the frame latency waitable should be a semaphore instead. However, it's a step in the right direction, in preparation to make the waitable an actual semaphore. --- dlls/dxgi/swapchain.c | 9 +++++++++ dlls/dxgi/tests/dxgi.c | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index b1ce578e8be..05ffe628d95 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -2886,6 +2886,15 @@ static HRESULT STDMETHODCALLTYPE d3d12_swapchain_SetMaximumFrameLatency(IDXGISwa return DXGI_ERROR_INVALID_CALL; }
+ if (max_latency > swapchain->frame_latency) + { + if (!SetEvent(swapchain->frame_latency_event)) + { + ERR("Failed to set event, last error %lu.\n", GetLastError()); + return HRESULT_FROM_WIN32(GetLastError()); + } + } + swapchain->frame_latency = max_latency; return S_OK; } diff --git a/dlls/dxgi/tests/dxgi.c b/dlls/dxgi/tests/dxgi.c index 912bcd8fa61..7cf52d626da 100644 --- a/dlls/dxgi/tests/dxgi.c +++ b/dlls/dxgi/tests/dxgi.c @@ -7143,7 +7143,7 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) ok(frame_latency == 3, "Got unexpected frame latency %#x.\n", frame_latency);
wait_result = WaitForSingleObject(semaphore, 0); - todo_wine ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); + ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); wait_result = WaitForSingleObject(semaphore, 0); todo_wine ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); wait_result = WaitForSingleObject(semaphore, 100);
From: Giovanni Mascellani gmascellani@codeweavers.com
The bias was a broken solution for handling the maximum frame latency changes. However we now already set the event (and, in the future, release the sempahore) when the maximum frame latency is increased, so there is no need for the bias anymore.
At the same time we also have to signal the fence directly instead of going through the command queue again. That's because at that point we just submitted the frame for presentation to Vulkan, so the best approximation we have for when the frame will be presented is now, not after the currently outstanding work in the command queue is processed. It's still a rather poor approximation, but for something better we need VK_KHR_present_wait. --- dlls/dxgi/swapchain.c | 13 ++----------- dlls/dxgi/tests/dxgi.c | 1 - 2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 05ffe628d95..38f0a6e6b07 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -2296,12 +2296,7 @@ static HRESULT d3d12_swapchain_op_present_execute(struct d3d12_swapchain *swapch
if (swapchain->frame_latency_fence) { - /* Use the same bias as d3d12_swapchain_present(). Add one to - * account for the "++swapchain->frame_number" there. */ - uint64_t number = op->present.frame_number + DXGI_MAX_SWAP_CHAIN_BUFFERS + 1; - - if (FAILED(hr = ID3D12CommandQueue_Signal(swapchain->command_queue, - swapchain->frame_latency_fence, number))) + if (FAILED(hr = ID3D12Fence_Signal(swapchain->frame_latency_fence, op->present.frame_number + 1))) { ERR("Failed to signal frame latency fence, hr %#lx.\n", hr); return hr; @@ -2351,12 +2346,8 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, ++swapchain->frame_number; if ((frame_latency_event = swapchain->frame_latency_event)) { - /* Bias the frame number to avoid underflowing in - * SetEventOnCompletion(). */ - uint64_t number = swapchain->frame_number + DXGI_MAX_SWAP_CHAIN_BUFFERS; - if (FAILED(hr = ID3D12Fence_SetEventOnCompletion(swapchain->frame_latency_fence, - number - swapchain->frame_latency, frame_latency_event))) + swapchain->frame_number, frame_latency_event))) { ERR("Failed to enqueue frame latency event, hr %#lx.\n", hr); return hr; diff --git a/dlls/dxgi/tests/dxgi.c b/dlls/dxgi/tests/dxgi.c index 7cf52d626da..b8cf3597483 100644 --- a/dlls/dxgi/tests/dxgi.c +++ b/dlls/dxgi/tests/dxgi.c @@ -7222,7 +7222,6 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) for (i = 0; i < 3; i++) { wait_result = WaitForSingleObject(semaphore, 100); - todo_wine_if(i != 0) ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); }
From: Giovanni Mascellani gmascellani@codeweavers.com
It is now useless: we signal signal it directly and use it to set an event, without having anything to do with the CPU. We can set the event directly instead. --- dlls/dxgi/swapchain.c | 30 ++++-------------------------- 1 file changed, 4 insertions(+), 26 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 38f0a6e6b07..c5fc4fad337 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -1191,7 +1191,6 @@ struct d3d12_swapchain DXGI_SWAP_CHAIN_FULLSCREEN_DESC fullscreen_desc; LONG in_set_fullscreen_state;
- ID3D12Fence *frame_latency_fence; HANDLE frame_latency_event;
uint64_t frame_number; @@ -2057,9 +2056,6 @@ static void d3d12_swapchain_destroy(struct d3d12_swapchain *swapchain) if (swapchain->frame_latency_event) CloseHandle(swapchain->frame_latency_event);
- if (swapchain->frame_latency_fence) - ID3D12Fence_Release(swapchain->frame_latency_fence); - if (swapchain->command_queue) ID3D12CommandQueue_Release(swapchain->command_queue);
@@ -2294,12 +2290,12 @@ static HRESULT d3d12_swapchain_op_present_execute(struct d3d12_swapchain *swapch return hresult_from_vk_result(vr); }
- if (swapchain->frame_latency_fence) + if (swapchain->frame_latency_event) { - if (FAILED(hr = ID3D12Fence_Signal(swapchain->frame_latency_fence, op->present.frame_number + 1))) + if (!SetEvent(swapchain->frame_latency_event)) { - ERR("Failed to signal frame latency fence, hr %#lx.\n", hr); - return hr; + ERR("Failed to set frame latency event, last error %ld.\n", GetLastError()); + return HRESULT_FROM_WIN32(GetLastError()); } }
@@ -2310,7 +2306,6 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, unsigned int sync_interval, unsigned int flags) { struct d3d12_swapchain_op *op; - HANDLE frame_latency_event; HRESULT hr;
if (sync_interval > 4) @@ -2344,15 +2339,6 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, LeaveCriticalSection(&swapchain->worker_cs);
++swapchain->frame_number; - if ((frame_latency_event = swapchain->frame_latency_event)) - { - if (FAILED(hr = ID3D12Fence_SetEventOnCompletion(swapchain->frame_latency_fence, - swapchain->frame_number, frame_latency_event))) - { - ERR("Failed to enqueue frame latency event, hr %#lx.\n", hr); - return hr; - } - }
if (FAILED(hr = vkd3d_signal_fence_on_cpu(swapchain->command_queue, swapchain->present_fence, swapchain->frame_number))) @@ -3358,14 +3344,6 @@ static HRESULT d3d12_swapchain_init(struct d3d12_swapchain *swapchain, IWineDXGI { swapchain->frame_latency = 1;
- if (FAILED(hr = ID3D12Device_CreateFence(device, DXGI_MAX_SWAP_CHAIN_BUFFERS, - 0, &IID_ID3D12Fence, (void **)&swapchain->frame_latency_fence))) - { - WARN("Failed to create frame latency fence, hr %#lx.\n", hr); - d3d12_swapchain_destroy(swapchain); - return hr; - } - if (!(swapchain->frame_latency_event = CreateEventW(NULL, FALSE, TRUE, NULL))) { hr = HRESULT_FROM_WIN32(GetLastError());
From: Giovanni Mascellani gmascellani@codeweavers.com
As it should be, given that it is supposed to keep the count of how many frames are currently in flight. --- dlls/dxgi/swapchain.c | 24 ++++++++++++------------ dlls/dxgi/tests/dxgi.c | 4 +--- 2 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index c5fc4fad337..67b637fbeaf 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -1191,7 +1191,7 @@ struct d3d12_swapchain DXGI_SWAP_CHAIN_FULLSCREEN_DESC fullscreen_desc; LONG in_set_fullscreen_state;
- HANDLE frame_latency_event; + HANDLE frame_latency_semaphore;
uint64_t frame_number; uint32_t frame_latency; @@ -2053,8 +2053,8 @@ static void d3d12_swapchain_destroy(struct d3d12_swapchain *swapchain) if (swapchain->present_fence) ID3D12Fence_Release(swapchain->present_fence);
- if (swapchain->frame_latency_event) - CloseHandle(swapchain->frame_latency_event); + if (swapchain->frame_latency_semaphore) + CloseHandle(swapchain->frame_latency_semaphore);
if (swapchain->command_queue) ID3D12CommandQueue_Release(swapchain->command_queue); @@ -2290,11 +2290,11 @@ static HRESULT d3d12_swapchain_op_present_execute(struct d3d12_swapchain *swapch return hresult_from_vk_result(vr); }
- if (swapchain->frame_latency_event) + if (swapchain->frame_latency_semaphore) { - if (!SetEvent(swapchain->frame_latency_event)) + if (!ReleaseSemaphore(swapchain->frame_latency_semaphore, 1, NULL)) { - ERR("Failed to set frame latency event, last error %ld.\n", GetLastError()); + ERR("Failed to release frame latency semaphore, last error %ld.\n", GetLastError()); return HRESULT_FROM_WIN32(GetLastError()); } } @@ -2865,9 +2865,9 @@ static HRESULT STDMETHODCALLTYPE d3d12_swapchain_SetMaximumFrameLatency(IDXGISwa
if (max_latency > swapchain->frame_latency) { - if (!SetEvent(swapchain->frame_latency_event)) + if (!ReleaseSemaphore(swapchain->frame_latency_semaphore, max_latency - swapchain->frame_latency, NULL)) { - ERR("Failed to set event, last error %lu.\n", GetLastError()); + ERR("Failed to release frame latency semaphore, last error %lu.\n", GetLastError()); return HRESULT_FROM_WIN32(GetLastError()); } } @@ -2900,10 +2900,10 @@ static HANDLE STDMETHODCALLTYPE d3d12_swapchain_GetFrameLatencyWaitableObject(ID
TRACE("iface %p.\n", iface);
- if (!swapchain->frame_latency_event) + if (!swapchain->frame_latency_semaphore) return NULL;
- ret = DuplicateHandle(GetCurrentProcess(), swapchain->frame_latency_event, GetCurrentProcess(), + ret = DuplicateHandle(GetCurrentProcess(), swapchain->frame_latency_semaphore, GetCurrentProcess(), &dup, 0, FALSE, DUPLICATE_SAME_ACCESS);
if (!ret) @@ -3344,10 +3344,10 @@ static HRESULT d3d12_swapchain_init(struct d3d12_swapchain *swapchain, IWineDXGI { swapchain->frame_latency = 1;
- if (!(swapchain->frame_latency_event = CreateEventW(NULL, FALSE, TRUE, NULL))) + if (!(swapchain->frame_latency_semaphore = CreateSemaphoreW(NULL, 1, LONG_MAX, NULL))) { hr = HRESULT_FROM_WIN32(GetLastError()); - WARN("Failed to create frame latency event, hr %#lx.\n", hr); + WARN("Failed to create frame latency semaphore, hr %#lx.\n", hr); d3d12_swapchain_destroy(swapchain); return hr; } diff --git a/dlls/dxgi/tests/dxgi.c b/dlls/dxgi/tests/dxgi.c index b8cf3597483..a5a76a6158a 100644 --- a/dlls/dxgi/tests/dxgi.c +++ b/dlls/dxgi/tests/dxgi.c @@ -7145,7 +7145,7 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) wait_result = WaitForSingleObject(semaphore, 0); ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); wait_result = WaitForSingleObject(semaphore, 0); - todo_wine ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); + ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); wait_result = WaitForSingleObject(semaphore, 100); ok(wait_result == WAIT_TIMEOUT, "Got unexpected wait result %#lx.\n", wait_result);
@@ -7184,7 +7184,6 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) for (i = 0; i < 5; i++) { wait_result = WaitForSingleObject(semaphore, 100); - todo_wine_if(i != 0) ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); }
@@ -7236,7 +7235,6 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) for (i = 0; i < 4; i++) { wait_result = WaitForSingleObject(semaphore, 100); - todo_wine_if(i != 0) ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); }
From: Giovanni Mascellani gmascellani@codeweavers.com
The client is expected to wait on the frame latency semaphore when the swapchain is created with DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT. But when it is not, we're supposed to do it ourselves. --- dlls/dxgi/swapchain.c | 38 +++++++++++++++++++++++--------------- 1 file changed, 23 insertions(+), 15 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 67b637fbeaf..57783972a9c 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -2290,13 +2290,10 @@ static HRESULT d3d12_swapchain_op_present_execute(struct d3d12_swapchain *swapch return hresult_from_vk_result(vr); }
- if (swapchain->frame_latency_semaphore) + if (!ReleaseSemaphore(swapchain->frame_latency_semaphore, 1, NULL)) { - if (!ReleaseSemaphore(swapchain->frame_latency_semaphore, 1, NULL)) - { - ERR("Failed to release frame latency semaphore, last error %ld.\n", GetLastError()); - return HRESULT_FROM_WIN32(GetLastError()); - } + ERR("Failed to release frame latency semaphore, last error %ld.\n", GetLastError()); + return HRESULT_FROM_WIN32(GetLastError()); }
return S_OK; @@ -2348,6 +2345,14 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, }
swapchain->current_buffer_index = (swapchain->current_buffer_index + 1) % swapchain->desc.BufferCount; + + if (!(swapchain->desc.Flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT) + && WaitForSingleObject(swapchain->frame_latency_semaphore, INFINITE) != WAIT_OBJECT_0) + { + ERR("Failed to wait for frame latency semaphore, last error %ld.\n", GetLastError()); + return HRESULT_FROM_WIN32(GetLastError()); + } + return S_OK; }
@@ -2900,8 +2905,11 @@ static HANDLE STDMETHODCALLTYPE d3d12_swapchain_GetFrameLatencyWaitableObject(ID
TRACE("iface %p.\n", iface);
- if (!swapchain->frame_latency_semaphore) + if (!(swapchain->desc.Flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT)) + { + WARN("DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT not set for swap chain %p.\n", iface); return NULL; + }
ret = DuplicateHandle(GetCurrentProcess(), swapchain->frame_latency_semaphore, GetCurrentProcess(), &dup, 0, FALSE, DUPLICATE_SAME_ACCESS); @@ -3341,16 +3349,16 @@ static HRESULT d3d12_swapchain_init(struct d3d12_swapchain *swapchain, IWineDXGI swapchain->current_buffer_index = 0;
if (swapchain_desc->Flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT) - { swapchain->frame_latency = 1; + else + swapchain->frame_latency = 3;
- if (!(swapchain->frame_latency_semaphore = CreateSemaphoreW(NULL, 1, LONG_MAX, NULL))) - { - hr = HRESULT_FROM_WIN32(GetLastError()); - WARN("Failed to create frame latency semaphore, hr %#lx.\n", hr); - d3d12_swapchain_destroy(swapchain); - return hr; - } + if (!(swapchain->frame_latency_semaphore = CreateSemaphoreW(NULL, 1, LONG_MAX, NULL))) + { + hr = HRESULT_FROM_WIN32(GetLastError()); + WARN("Failed to create frame latency semaphore, hr %#lx.\n", hr); + d3d12_swapchain_destroy(swapchain); + return hr; }
if (FAILED(hr = ID3D12Device_CreateFence(device, 0, 0,
I think I understand this well enough, thank you for the detailed comments.
1/6 doesn't reflect the actual commit that was merged, so I guess it should be updated?
At the same time we also have to signal the fence directly instead of going through the command queue again. That's because at that point we just submitted the frame for presentation to Vulkan, so the best approximation we have for when the frame will be presented is now, not after the currently outstanding work in the command queue is processed. It's still a rather poor approximation, but for something better we need VK_KHR_present_wait.
Is this actually related to the bias removal? I don't immediately see why it's related, but I'm probably missing something.
Do you have a reference for where the frame latency event is actually supposed to be signaled? i.e. that KHR_present_wait is the correct place? I don't think I see a problem with moving it here either way, but it'd be nice to know.
``` @@ -2348,6 +2345,14 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, }
swapchain->current_buffer_index = (swapchain->current_buffer_index + 1) % swapchain->desc.BufferCount; + + if (!(swapchain->desc.Flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT) + && WaitForSingleObject(swapchain->frame_latency_semaphore, INFINITE) != WAIT_OBJECT_0) + { + ERR("Failed to wait for frame latency semaphore, last error %ld.\n", GetLastError()); + return HRESULT_FROM_WIN32(GetLastError()); + } + return S_OK; } ```
Should we wait *before* presenting instead? It's what the documentation seems to suggest native does [1], and it seems like it makes more sense—wait until there's "room" to present a frame before doing so? I don't know if it makes a difference in practice, though.
I do see that wined3d does wait after presenting, so maybe there's a good reason for that?
[1] https://learn.microsoft.com/en-us/windows/uwp/gaming/reduce-latency-with-dxg...
`+ swapchain->frame_latency = 3;`
This seems reasonable enough, but is it derived from anywhere specific?
I think I understand this well enough, thank you for the detailed comments.
Good, I'm glad they're helpful, given that I spent a good amount of time in making them as clear as I could.
1/6 doesn't reflect the actual commit that was merged, so I guess it should be updated?
Sure. Initially I would assume that this MR would wait for the vkd3d release anyway, but if it is fine to touch the vkd3d sources directly it's even better for me; I'll fix the MR.
Is this actually related to the bias removal? I don't immediately see why it's related, but I'm probably missing something.
I agree it's not very obvious, and surely there is room for interpretation about whether they're related or not. The immediate reason for putting these two changes together is that if I don't I temporarily have to add a `todo_wine`, because, even while broken, the bias covers up for a deficiency with signalling through the command line rather then signalling the fence directly. If that's ok for you I can first remove the bias and then change the `Signal()` in another commit.
Do you have a reference for where the frame latency event is actually supposed to be signaled? i.e. that KHR_present_wait is the correct place? I don't think I see a problem with moving it here either way, but it'd be nice to know.
[IDXGISwapChain2::GetFrameLatencyWaitableObject()](https://learn.microsoft.com/en-us/windows/win32/api/dxgi1_3/nf-dxgi1_3-idxgi...) says "a waitable handle that signals when the DXGI adapter has finished presenting a new frame", which I interpret as being the same thing that Vulkan specifies for VK_KHR_present_wait. AFAIU the idea is that you decide that you don't want to have more than, say, three frames in flight at any given time. So you start your semaphore at three, release each time a frame is presented (i.e., it's not in flight any more) and wait for it each time you want to start working on a new one.
Should we wait _before_ presenting instead? It's what the documentation seems to suggest native does [1], and it seems like it makes more sense—wait until there's "room" to present a frame before doing so? I don't know if it makes a difference in practice, though.
I'm not sure I read that in that page. Reading step 4, it seems that the suggested loop is wait-render-present, rather than render-wait-present. So, if you assume that the onus of waiting is on the implementation rather than the client, the wait should happen after presenting, not before. And TBH that's what sounds more sensible to me. The wait operation is not to ensure you have room: you're assuming you have room in the swapchain already when you're rendering, because you're rendering to the swapchain image. The wait operation is to avoid start doing computations for a following frame too early, in order to keep latency within the maximum you decided.
`+ swapchain->frame_latency = 3;`
This seems reasonable enough, but is it derived from anywhere specific?
I see it in [IDXGIDevice1::SetMaximumFrameLatency()](https://learn.microsoft.com/en-us/windows/win32/api/dxgi/nf-dxgi-idxgidevice...), section "Parameters". It also aligns with my timing observations. Unfortunately it's very hard to write tests for that that are not very flaky, because timing is the only observable that I have and that's of course very noisy (when I have access to the frame latency waitable it's a bit easier, since checking the semaphore count is somewhat easier).
1/6 doesn't reflect the actual commit that was merged, so I guess it should be updated?
Sure. Initially I would assume that this MR would wait for the vkd3d release anyway, but if it is fine to touch the vkd3d sources directly it's even better for me; I'll fix the MR.
Ah, I think there's no reason not to wait for the release (and not waiting would tick off distributions, although we're already needlessly annoying them...)
Is this actually related to the bias removal? I don't immediately see why it's related, but I'm probably missing something.
I agree it's not very obvious, and surely there is room for interpretation about whether they're related or not. The immediate reason for putting these two changes together is that if I don't I temporarily have to add a `todo_wine`, because, even while broken, the bias covers up for a deficiency with signalling through the command line rather then signalling the fence directly. If that's ok for you I can first remove the bias and then change the `Signal()` in another commit.
Hmm, which test? I guess we do a Wait(0) and expect that to be signaled immediately, whereas waiting for the GPU to be done isn't fast enough?
I think that's likely enough to not *actually* break something, that having the temporary todo is fine. But it also calls to question whether Wait(0) is actually the right thing.
Or could we even just swap the order?
Do you have a reference for where the frame latency event is actually supposed to be signaled? i.e. that KHR_present_wait is the correct place? I don't think I see a problem with moving it here either way, but it'd be nice to know.
[IDXGISwapChain2::GetFrameLatencyWaitableObject()](https://learn.microsoft.com/en-us/windows/win32/api/dxgi1_3/nf-dxgi1_3-idxgi...) says "a waitable handle that signals when the DXGI adapter has finished presenting a new frame", which I interpret as being the same thing that Vulkan specifies for VK_KHR_present_wait. AFAIU the idea is that you decide that you don't want to have more than, say, three frames in flight at any given time. So you start your semaphore at three, release each time a frame is presented (i.e., it's not in flight any more) and wait for it each time you want to start working on a new one.
Yeah, but there's a lot of nuance there. You might expect this kind of thing to be signaled, say:
(1) when the present is submitted to the GPU
(2) when the present is visible to the underlying graphics library, so drawing commands from other APIs will work — this is what GLX_OML_sync_control does, or at least what we are trying to use it for. This may be before or after (1)
(3) when the first pixel is visible to the user — this is what KHR_present_wait is supposed to do
(4) when the last pixel is visible to the user
I could honestly see the language that Microsoft uses meaning any one of these. I might be more inclined to guess (4), honestly. I don't know which one makes sense if you want to avoid having more than N frames in flight, since I basically don't know what the purpose of that is. [I guess for something like a video game, where you want the user's reactions based on their last input, you'd want (4), but I don't think that quite works out wrt neurology anyway...]
This commit changes it to be (1). I don't see a reason for that to matter, but I don't have much knowledge of this in the first place.
If we want to emulate (4), I guess we'd want to do a CPU wait for a semaphore signaled by the present command we just queued.
Should we wait _before_ presenting instead? It's what the documentation seems to suggest native does [1], and it seems like it makes more sense—wait until there's "room" to present a frame before doing so? I don't know if it makes a difference in practice, though.
I'm not sure I read that in that page. Reading step 4, it seems that the suggested loop is wait-render-present, rather than render-wait-present. So, if you assume that the onus of waiting is on the implementation rather than the client, the wait should happen after presenting, not before. And TBH that's what sounds more sensible to me. The wait operation is not to ensure you have room: you're assuming you have room in the swapchain already when you're rendering, because you're rendering to the swapchain image. The wait operation is to avoid start doing computations for a following frame too early, in order to keep latency within the maximum you decided.
Sure, that's the intent of the latency event feature, but there's also this sentence:
"When the rendering loop calls Present(), the system blocks the thread until it is done presenting a prior frame, making room to queue up the new frame, before it actually presents."
Which seems to say pretty unequivocally that native does a wait before present.
I can't think of a way in which this difference would be actually observable, mind.
`+ swapchain->frame_latency = 3;`
This seems reasonable enough, but is it derived from anywhere specific?
I see it in [IDXGIDevice1::SetMaximumFrameLatency()](https://learn.microsoft.com/en-us/windows/win32/api/dxgi/nf-dxgi-idxgidevice...), section "Parameters". It also aligns with my timing observations. Unfortunately it's very hard to write tests for that that are not very flaky, because timing is the only observable that I have and that's of course very noisy (when I have access to the frame latency waitable it's a bit easier, since checking the semaphore count is somewhat easier).
Ah, thanks!
EDIT: fight markdown