This is an alternative proposal to !5830. It fixes the same glitches in Horizon Zero Dawn, but attempts to do that by implementing the proper behavior expected from the swapchain. More precisely, it fixes a few mistakes I had made the last time I touched this code, it implements the frame latency waitable with a semaphore instead of an event (as it should be) and at last it uses the frame latency waitable to implement an appropriate wait after presentation when the client didn't request to manage the waitable directly.
This implementation is not yet complete, because the semaphore should be released when the frame is presented. Knowing when this is the case would require using `VK_KHR_present_wait`. We currently approximate that with the moment in which the frame is submitted for presentation to Vulkan, which as far as I can tell is good enough. Eventually I'll try to write the proper thing.
This MR depends on https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/1343, which has been submitted in the meantime to vkd3d. The vkd3d change is included in the first commit, so this MR is already functional.
-- v2: dxgi: Wait on the frame latency semaphore when the client doesn't do it.
From: Giovanni Mascellani gmascellani@codeweavers.com
The new vkd3d API allows submitting presentation as soon as the internal vkd3d work is flushed to Vulkan. Now we're also waiting for it to be executed, which is useless. --- dlls/dxgi/swapchain.c | 2 +- dlls/wined3d/wined3d.spec | 1 + libs/vkd3d/include/vkd3d.h | 29 ++++++++++++++++++--- libs/vkd3d/libs/vkd3d/command.c | 37 +++++++++++++++++++++++++++ libs/vkd3d/libs/vkd3d/vkd3d_private.h | 1 + 5 files changed, 66 insertions(+), 4 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 4797fcecd8d..b1ce578e8be 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -2363,7 +2363,7 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, } }
- if (FAILED(hr = ID3D12CommandQueue_Signal(swapchain->command_queue, + if (FAILED(hr = vkd3d_signal_fence_on_cpu(swapchain->command_queue, swapchain->present_fence, swapchain->frame_number))) { ERR("Failed to signal present fence, hf %#lx.\n", hr); diff --git a/dlls/wined3d/wined3d.spec b/dlls/wined3d/wined3d.spec index 151bcaf9751..eb352d144db 100644 --- a/dlls/wined3d/wined3d.spec +++ b/dlls/wined3d/wined3d.spec @@ -360,6 +360,7 @@ @ cdecl vkd3d_resource_incref(ptr) @ cdecl vkd3d_serialize_root_signature(ptr long ptr ptr) @ cdecl vkd3d_serialize_versioned_root_signature(ptr ptr ptr) +@ cdecl vkd3d_signal_fence_on_cpu(ptr ptr long)
@ cdecl vkd3d_shader_compile(ptr ptr ptr) @ cdecl vkd3d_shader_convert_root_signature(ptr long ptr) diff --git a/libs/vkd3d/include/vkd3d.h b/libs/vkd3d/include/vkd3d.h index b18fd14f4c3..59516f4af80 100644 --- a/libs/vkd3d/include/vkd3d.h +++ b/libs/vkd3d/include/vkd3d.h @@ -411,9 +411,13 @@ VKD3D_API uint32_t vkd3d_get_vk_queue_family_index(ID3D12CommandQueue *queue); * the Vulkan driver as being submitted before other work submitted * though the Direct3D 12 API. If this is not desired, it is * recommended to synchronize work submission using an ID3D12Fence - * object, by submitting to the queue a signal operation after all the - * Direct3D 12 work is submitted and waiting for it before calling - * vkd3d_acquire_vk_queue(). + * object: + * 1. submit work through the Direct3D 12 API; + * 2. call vkd3d_signal_fence_on_cpu(); + * 3. wait for the fence to be signalled; + * 4. call vkd3d_acquire_vk_queue(); it is guaranteed that all work submitted + * at point 1 has already been submitted to Vulkan (though not necessarily + * executed). * * \since 1.0 */ @@ -466,6 +470,21 @@ VKD3D_API HRESULT vkd3d_create_versioned_root_signature_deserializer(const void */ VKD3D_API void vkd3d_set_log_callback(PFN_vkd3d_log callback);
+/** + * Signal a fence on the CPU once all the currently outstanding queue work is + * submitted to Vulkan. + * + * The fence will be signalled on the CPU (as if ID3D12Fence_Signal() was + * called) once all the work submitted through the Direct3D 12 API before + * vkd3d_signal_fence_on_cpu() is called has left the internal queue and has + * been submitted to the underlying Vulkan queue. Read the documentation for + * vkd3d_acquire_vk_queue() for more details. + * + * \since 1.15 + */ +VKD3D_API enum vkd3d_result vkd3d_signal_fence_on_cpu(ID3D12CommandQueue *queue, + ID3D12Fence *fence, uint64_t value); + #endif /* VKD3D_NO_PROTOTYPES */
/* @@ -512,6 +531,10 @@ typedef HRESULT (*PFN_vkd3d_create_versioned_root_signature_deserializer)(const /** Type of vkd3d_set_log_callback(). \since 1.4 */ typedef void (*PFN_vkd3d_set_log_callback)(PFN_vkd3d_log callback);
+/** Type of vkd3d_signal_fence_on_cpu(). \since 1.15 */ +typedef enum vkd3d_result (*PFN_vkd3d_signal_fence_on_cpu)(ID3D12CommandQueue *queue, + ID3D12Fence *fence, uint64_t value); + #ifdef __cplusplus } #endif /* __cplusplus */ diff --git a/libs/vkd3d/libs/vkd3d/command.c b/libs/vkd3d/libs/vkd3d/command.c index a55a97f6f2f..ed7c2aa8677 100644 --- a/libs/vkd3d/libs/vkd3d/command.c +++ b/libs/vkd3d/libs/vkd3d/command.c @@ -6345,6 +6345,7 @@ static void d3d12_command_queue_destroy_op(struct vkd3d_cs_op_data *op) break;
case VKD3D_CS_OP_SIGNAL: + case VKD3D_CS_OP_SIGNAL_ON_CPU: d3d12_fence_decref(op->u.signal.fence); break;
@@ -7335,6 +7336,7 @@ static HRESULT d3d12_command_queue_flush_ops_locked(struct d3d12_command_queue * struct vkd3d_cs_op_data *op; struct d3d12_fence *fence; unsigned int i; + HRESULT hr;
queue->is_flushing = true;
@@ -7368,6 +7370,11 @@ static HRESULT d3d12_command_queue_flush_ops_locked(struct d3d12_command_queue * d3d12_command_queue_signal(queue, op->u.signal.fence, op->u.signal.value); break;
+ case VKD3D_CS_OP_SIGNAL_ON_CPU: + if (FAILED(hr = d3d12_fence_Signal(&op->u.signal.fence->ID3D12Fence1_iface, op->u.signal.value))) + ERR("Failed to signal fence %p, hr %s.\n", op->u.signal.fence, debugstr_hresult(hr)); + break; + case VKD3D_CS_OP_EXECUTE: d3d12_command_queue_execute(queue, op->u.execute.buffers, op->u.execute.buffer_count); break; @@ -7510,6 +7517,36 @@ void vkd3d_release_vk_queue(ID3D12CommandQueue *queue) return vkd3d_queue_release(d3d12_queue->vkd3d_queue); }
+enum vkd3d_result vkd3d_signal_fence_on_cpu(ID3D12CommandQueue *iface, ID3D12Fence *fence_iface, uint64_t value) +{ + struct d3d12_command_queue *command_queue = impl_from_ID3D12CommandQueue(iface); + struct d3d12_fence *fence = unsafe_impl_from_ID3D12Fence(fence_iface); + struct vkd3d_cs_op_data *op; + HRESULT hr = S_OK; + + TRACE("iface %p, fence %p, value %#"PRIx64".\n", iface, fence_iface, value); + + vkd3d_mutex_lock(&command_queue->op_mutex); + + if (!(op = d3d12_command_queue_op_array_require_space(&command_queue->op_queue))) + { + ERR("Failed to add op.\n"); + hr = E_OUTOFMEMORY; + goto done; + } + op->opcode = VKD3D_CS_OP_SIGNAL_ON_CPU; + op->u.signal.fence = fence; + op->u.signal.value = value; + + d3d12_fence_incref(fence); + + d3d12_command_queue_submit_locked(command_queue); + +done: + vkd3d_mutex_unlock(&command_queue->op_mutex); + return hr; +} + /* ID3D12CommandSignature */ static inline struct d3d12_command_signature *impl_from_ID3D12CommandSignature(ID3D12CommandSignature *iface) { diff --git a/libs/vkd3d/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/libs/vkd3d/vkd3d_private.h index 97a99782d6a..fde6e9e9d4e 100644 --- a/libs/vkd3d/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/libs/vkd3d/vkd3d_private.h @@ -1324,6 +1324,7 @@ enum vkd3d_cs_op { VKD3D_CS_OP_WAIT, VKD3D_CS_OP_SIGNAL, + VKD3D_CS_OP_SIGNAL_ON_CPU, VKD3D_CS_OP_EXECUTE, VKD3D_CS_OP_UPDATE_MAPPINGS, VKD3D_CS_OP_COPY_MAPPINGS,
From: Giovanni Mascellani gmascellani@codeweavers.com
This is only partially correct, because the frame latency waitable should be a semaphore instead. However, it's a step in the right direction, in preparation to make the waitable an actual semaphore. --- dlls/dxgi/swapchain.c | 9 +++++++++ dlls/dxgi/tests/dxgi.c | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index b1ce578e8be..05ffe628d95 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -2886,6 +2886,15 @@ static HRESULT STDMETHODCALLTYPE d3d12_swapchain_SetMaximumFrameLatency(IDXGISwa return DXGI_ERROR_INVALID_CALL; }
+ if (max_latency > swapchain->frame_latency) + { + if (!SetEvent(swapchain->frame_latency_event)) + { + ERR("Failed to set event, last error %lu.\n", GetLastError()); + return HRESULT_FROM_WIN32(GetLastError()); + } + } + swapchain->frame_latency = max_latency; return S_OK; } diff --git a/dlls/dxgi/tests/dxgi.c b/dlls/dxgi/tests/dxgi.c index 912bcd8fa61..7cf52d626da 100644 --- a/dlls/dxgi/tests/dxgi.c +++ b/dlls/dxgi/tests/dxgi.c @@ -7143,7 +7143,7 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) ok(frame_latency == 3, "Got unexpected frame latency %#x.\n", frame_latency);
wait_result = WaitForSingleObject(semaphore, 0); - todo_wine ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); + ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); wait_result = WaitForSingleObject(semaphore, 0); todo_wine ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); wait_result = WaitForSingleObject(semaphore, 100);
From: Giovanni Mascellani gmascellani@codeweavers.com
The bias was a broken solution for handling the maximum frame latency changes. However we now already set the event (and, in the future, release the sempahore) when the maximum frame latency is increased, so there is no need for the bias anymore.
At the same time we also have to signal the fence directly instead of going through the command queue again. That's because at that point we just submitted the frame for presentation to Vulkan, so the best approximation we have for when the frame will be presented is now, not after the currently outstanding work in the command queue is processed. It's still a rather poor approximation, but for something better we need VK_KHR_present_wait. --- dlls/dxgi/swapchain.c | 13 ++----------- dlls/dxgi/tests/dxgi.c | 1 - 2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 05ffe628d95..38f0a6e6b07 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -2296,12 +2296,7 @@ static HRESULT d3d12_swapchain_op_present_execute(struct d3d12_swapchain *swapch
if (swapchain->frame_latency_fence) { - /* Use the same bias as d3d12_swapchain_present(). Add one to - * account for the "++swapchain->frame_number" there. */ - uint64_t number = op->present.frame_number + DXGI_MAX_SWAP_CHAIN_BUFFERS + 1; - - if (FAILED(hr = ID3D12CommandQueue_Signal(swapchain->command_queue, - swapchain->frame_latency_fence, number))) + if (FAILED(hr = ID3D12Fence_Signal(swapchain->frame_latency_fence, op->present.frame_number + 1))) { ERR("Failed to signal frame latency fence, hr %#lx.\n", hr); return hr; @@ -2351,12 +2346,8 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, ++swapchain->frame_number; if ((frame_latency_event = swapchain->frame_latency_event)) { - /* Bias the frame number to avoid underflowing in - * SetEventOnCompletion(). */ - uint64_t number = swapchain->frame_number + DXGI_MAX_SWAP_CHAIN_BUFFERS; - if (FAILED(hr = ID3D12Fence_SetEventOnCompletion(swapchain->frame_latency_fence, - number - swapchain->frame_latency, frame_latency_event))) + swapchain->frame_number, frame_latency_event))) { ERR("Failed to enqueue frame latency event, hr %#lx.\n", hr); return hr; diff --git a/dlls/dxgi/tests/dxgi.c b/dlls/dxgi/tests/dxgi.c index 7cf52d626da..b8cf3597483 100644 --- a/dlls/dxgi/tests/dxgi.c +++ b/dlls/dxgi/tests/dxgi.c @@ -7222,7 +7222,6 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) for (i = 0; i < 3; i++) { wait_result = WaitForSingleObject(semaphore, 100); - todo_wine_if(i != 0) ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); }
From: Giovanni Mascellani gmascellani@codeweavers.com
It is now useless: we signal signal it directly and use it to set an event, without having anything to do with the CPU. We can set the event directly instead. --- dlls/dxgi/swapchain.c | 30 ++++-------------------------- 1 file changed, 4 insertions(+), 26 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 38f0a6e6b07..c5fc4fad337 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -1191,7 +1191,6 @@ struct d3d12_swapchain DXGI_SWAP_CHAIN_FULLSCREEN_DESC fullscreen_desc; LONG in_set_fullscreen_state;
- ID3D12Fence *frame_latency_fence; HANDLE frame_latency_event;
uint64_t frame_number; @@ -2057,9 +2056,6 @@ static void d3d12_swapchain_destroy(struct d3d12_swapchain *swapchain) if (swapchain->frame_latency_event) CloseHandle(swapchain->frame_latency_event);
- if (swapchain->frame_latency_fence) - ID3D12Fence_Release(swapchain->frame_latency_fence); - if (swapchain->command_queue) ID3D12CommandQueue_Release(swapchain->command_queue);
@@ -2294,12 +2290,12 @@ static HRESULT d3d12_swapchain_op_present_execute(struct d3d12_swapchain *swapch return hresult_from_vk_result(vr); }
- if (swapchain->frame_latency_fence) + if (swapchain->frame_latency_event) { - if (FAILED(hr = ID3D12Fence_Signal(swapchain->frame_latency_fence, op->present.frame_number + 1))) + if (!SetEvent(swapchain->frame_latency_event)) { - ERR("Failed to signal frame latency fence, hr %#lx.\n", hr); - return hr; + ERR("Failed to set frame latency event, last error %ld.\n", GetLastError()); + return HRESULT_FROM_WIN32(GetLastError()); } }
@@ -2310,7 +2306,6 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, unsigned int sync_interval, unsigned int flags) { struct d3d12_swapchain_op *op; - HANDLE frame_latency_event; HRESULT hr;
if (sync_interval > 4) @@ -2344,15 +2339,6 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, LeaveCriticalSection(&swapchain->worker_cs);
++swapchain->frame_number; - if ((frame_latency_event = swapchain->frame_latency_event)) - { - if (FAILED(hr = ID3D12Fence_SetEventOnCompletion(swapchain->frame_latency_fence, - swapchain->frame_number, frame_latency_event))) - { - ERR("Failed to enqueue frame latency event, hr %#lx.\n", hr); - return hr; - } - }
if (FAILED(hr = vkd3d_signal_fence_on_cpu(swapchain->command_queue, swapchain->present_fence, swapchain->frame_number))) @@ -3358,14 +3344,6 @@ static HRESULT d3d12_swapchain_init(struct d3d12_swapchain *swapchain, IWineDXGI { swapchain->frame_latency = 1;
- if (FAILED(hr = ID3D12Device_CreateFence(device, DXGI_MAX_SWAP_CHAIN_BUFFERS, - 0, &IID_ID3D12Fence, (void **)&swapchain->frame_latency_fence))) - { - WARN("Failed to create frame latency fence, hr %#lx.\n", hr); - d3d12_swapchain_destroy(swapchain); - return hr; - } - if (!(swapchain->frame_latency_event = CreateEventW(NULL, FALSE, TRUE, NULL))) { hr = HRESULT_FROM_WIN32(GetLastError());
From: Giovanni Mascellani gmascellani@codeweavers.com
As it should be, given that it is supposed to keep the count of how many frames are currently in flight. --- dlls/dxgi/swapchain.c | 24 ++++++++++++------------ dlls/dxgi/tests/dxgi.c | 4 +--- 2 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index c5fc4fad337..67b637fbeaf 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -1191,7 +1191,7 @@ struct d3d12_swapchain DXGI_SWAP_CHAIN_FULLSCREEN_DESC fullscreen_desc; LONG in_set_fullscreen_state;
- HANDLE frame_latency_event; + HANDLE frame_latency_semaphore;
uint64_t frame_number; uint32_t frame_latency; @@ -2053,8 +2053,8 @@ static void d3d12_swapchain_destroy(struct d3d12_swapchain *swapchain) if (swapchain->present_fence) ID3D12Fence_Release(swapchain->present_fence);
- if (swapchain->frame_latency_event) - CloseHandle(swapchain->frame_latency_event); + if (swapchain->frame_latency_semaphore) + CloseHandle(swapchain->frame_latency_semaphore);
if (swapchain->command_queue) ID3D12CommandQueue_Release(swapchain->command_queue); @@ -2290,11 +2290,11 @@ static HRESULT d3d12_swapchain_op_present_execute(struct d3d12_swapchain *swapch return hresult_from_vk_result(vr); }
- if (swapchain->frame_latency_event) + if (swapchain->frame_latency_semaphore) { - if (!SetEvent(swapchain->frame_latency_event)) + if (!ReleaseSemaphore(swapchain->frame_latency_semaphore, 1, NULL)) { - ERR("Failed to set frame latency event, last error %ld.\n", GetLastError()); + ERR("Failed to release frame latency semaphore, last error %ld.\n", GetLastError()); return HRESULT_FROM_WIN32(GetLastError()); } } @@ -2865,9 +2865,9 @@ static HRESULT STDMETHODCALLTYPE d3d12_swapchain_SetMaximumFrameLatency(IDXGISwa
if (max_latency > swapchain->frame_latency) { - if (!SetEvent(swapchain->frame_latency_event)) + if (!ReleaseSemaphore(swapchain->frame_latency_semaphore, max_latency - swapchain->frame_latency, NULL)) { - ERR("Failed to set event, last error %lu.\n", GetLastError()); + ERR("Failed to release frame latency semaphore, last error %lu.\n", GetLastError()); return HRESULT_FROM_WIN32(GetLastError()); } } @@ -2900,10 +2900,10 @@ static HANDLE STDMETHODCALLTYPE d3d12_swapchain_GetFrameLatencyWaitableObject(ID
TRACE("iface %p.\n", iface);
- if (!swapchain->frame_latency_event) + if (!swapchain->frame_latency_semaphore) return NULL;
- ret = DuplicateHandle(GetCurrentProcess(), swapchain->frame_latency_event, GetCurrentProcess(), + ret = DuplicateHandle(GetCurrentProcess(), swapchain->frame_latency_semaphore, GetCurrentProcess(), &dup, 0, FALSE, DUPLICATE_SAME_ACCESS);
if (!ret) @@ -3344,10 +3344,10 @@ static HRESULT d3d12_swapchain_init(struct d3d12_swapchain *swapchain, IWineDXGI { swapchain->frame_latency = 1;
- if (!(swapchain->frame_latency_event = CreateEventW(NULL, FALSE, TRUE, NULL))) + if (!(swapchain->frame_latency_semaphore = CreateSemaphoreW(NULL, 1, LONG_MAX, NULL))) { hr = HRESULT_FROM_WIN32(GetLastError()); - WARN("Failed to create frame latency event, hr %#lx.\n", hr); + WARN("Failed to create frame latency semaphore, hr %#lx.\n", hr); d3d12_swapchain_destroy(swapchain); return hr; } diff --git a/dlls/dxgi/tests/dxgi.c b/dlls/dxgi/tests/dxgi.c index b8cf3597483..a5a76a6158a 100644 --- a/dlls/dxgi/tests/dxgi.c +++ b/dlls/dxgi/tests/dxgi.c @@ -7145,7 +7145,7 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) wait_result = WaitForSingleObject(semaphore, 0); ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); wait_result = WaitForSingleObject(semaphore, 0); - todo_wine ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); + ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); wait_result = WaitForSingleObject(semaphore, 100); ok(wait_result == WAIT_TIMEOUT, "Got unexpected wait result %#lx.\n", wait_result);
@@ -7184,7 +7184,6 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) for (i = 0; i < 5; i++) { wait_result = WaitForSingleObject(semaphore, 100); - todo_wine_if(i != 0) ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); }
@@ -7236,7 +7235,6 @@ static void test_frame_latency_event(IUnknown *device, BOOL is_d3d12) for (i = 0; i < 4; i++) { wait_result = WaitForSingleObject(semaphore, 100); - todo_wine_if(i != 0) ok(!wait_result, "Got unexpected wait result %#lx.\n", wait_result); }
From: Giovanni Mascellani gmascellani@codeweavers.com
The client is expected to wait on the frame latency semaphore when the swapchain is created with DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT. But when it is not, we're supposed to do it ourselves. --- dlls/dxgi/swapchain.c | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 67b637fbeaf..833d1a12c49 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -2290,13 +2290,10 @@ static HRESULT d3d12_swapchain_op_present_execute(struct d3d12_swapchain *swapch return hresult_from_vk_result(vr); }
- if (swapchain->frame_latency_semaphore) + if (!ReleaseSemaphore(swapchain->frame_latency_semaphore, 1, NULL)) { - if (!ReleaseSemaphore(swapchain->frame_latency_semaphore, 1, NULL)) - { - ERR("Failed to release frame latency semaphore, last error %ld.\n", GetLastError()); - return HRESULT_FROM_WIN32(GetLastError()); - } + ERR("Failed to release frame latency semaphore, last error %ld.\n", GetLastError()); + return HRESULT_FROM_WIN32(GetLastError()); }
return S_OK; @@ -2322,6 +2319,13 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, return S_OK; }
+ if (!(swapchain->desc.Flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT) + && WaitForSingleObject(swapchain->frame_latency_semaphore, INFINITE) != WAIT_OBJECT_0) + { + ERR("Failed to wait for frame latency semaphore, last error %ld.\n", GetLastError()); + return HRESULT_FROM_WIN32(GetLastError()); + } + if (!(op = calloc(1, sizeof(*op)))) { WARN("Cannot allocate memory.\n"); @@ -2900,8 +2904,11 @@ static HANDLE STDMETHODCALLTYPE d3d12_swapchain_GetFrameLatencyWaitableObject(ID
TRACE("iface %p.\n", iface);
- if (!swapchain->frame_latency_semaphore) + if (!(swapchain->desc.Flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT)) + { + WARN("DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT not set for swap chain %p.\n", iface); return NULL; + }
ret = DuplicateHandle(GetCurrentProcess(), swapchain->frame_latency_semaphore, GetCurrentProcess(), &dup, 0, FALSE, DUPLICATE_SAME_ACCESS); @@ -3341,16 +3348,16 @@ static HRESULT d3d12_swapchain_init(struct d3d12_swapchain *swapchain, IWineDXGI swapchain->current_buffer_index = 0;
if (swapchain_desc->Flags & DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT) - { swapchain->frame_latency = 1; + else + swapchain->frame_latency = 3;
- if (!(swapchain->frame_latency_semaphore = CreateSemaphoreW(NULL, 1, LONG_MAX, NULL))) - { - hr = HRESULT_FROM_WIN32(GetLastError()); - WARN("Failed to create frame latency semaphore, hr %#lx.\n", hr); - d3d12_swapchain_destroy(swapchain); - return hr; - } + if (!(swapchain->frame_latency_semaphore = CreateSemaphoreW(NULL, 1, LONG_MAX, NULL))) + { + hr = HRESULT_FROM_WIN32(GetLastError()); + WARN("Failed to create frame latency semaphore, hr %#lx.\n", hr); + d3d12_swapchain_destroy(swapchain); + return hr; }
if (FAILED(hr = ID3D12Device_CreateFence(device, 0, 0,
Hmm, which test? I guess we do a Wait(0) and expect that to be signaled immediately, whereas waiting for the GPU to be done isn't fast enough?
I think that's likely enough to not _actually_ break something, that having the temporary todo is fine. But it also calls to question whether Wait(0) is actually the right thing.
Or could we even just swap the order?
I doubt any real program is going to be broken by splitting that commit, but in both orders you have a check that temporarily regresses. Partly because I couldn't choose between the orders, partly because the changes are somehow related, partly because they're both quite little, I ended up merging them. At any rate, I split that commit in both ways, you tell me which one you prefer: https://gitlab.winehq.org/giomasce/wine/-/commits/dxgi4 and https://gitlab.winehq.org/giomasce/wine/-/commits/dxgi5 (they both contain the same commit at the beginning to restrict testing to the test that actually matters and skip all the mode changing mess).
Yeah, but there's a lot of nuance there. You might expect this kind of thing to be signaled, say:
(1) when the present is submitted to the GPU
Well, submitted to the underlying API (Vulkan in this case). Whether that coincides with when the present is submitted to the GPU might depend on the driver, I think. At any rate, it's the last moment in which we know something about that frame's timing; once it's submitted to Vulkan we have no way to know any more, unless we use VK_KHR_present_wait.
(2) when the present is visible to the underlying graphics library, so drawing commands from other APIs will work — this is what GLX_OML_sync_control does, or at least what we are trying to use it for. This may be before or after (1)
(3) when the first pixel is visible to the user — this is what KHR_present_wait is supposed to do
(4) when the last pixel is visible to the user
I could honestly see the language that Microsoft uses meaning any one of these. I might be more inclined to guess (4), honestly. I don't know which one makes sense if you want to avoid having more than N frames in flight, since I basically don't know what the purpose of that is. [I guess for something like a video game, where you want the user's reactions based on their last input, you'd want (4), but I don't think that quite works out wrt neurology anyway...]
I would argue that (3) and (4) are basically the same thing: AFAIK on modern monoitors the image is displayed in one shot, there is no electronic cannon scanning the pixels one by one; as soon as the entire image is on the screen and the swap time comes, all the pixels switch to the new value. Even if I am mistaken, or if the user is using an older monitor, soon after the last pixel is made visible to the user a new frame will begin, and its first pixel will be visible to the user. So in practice (3) and (4) are either the same thing or very close to each other. In other words, I guess (3) and (4) are the best approximation one of the other if we want one of them and the other one is not directly available.
In my interpretation the expectation for the frame latency waitable is either (3) or (4). I don't know for sure which one, but as I said using VK_KHR_present_wait is the best strategy anyway.
This commit changes it to be (1). I don't see a reason for that to matter, but I don't have much knowledge of this in the first place.
Well, before that commit it is a random moment between when `Present()` is called and when the frame is actually presented. Which moment it is depends on factors that should be irrelevant, like how many other waits there are in the queue and how they interact to each other (i.e., when the internal vkd3d queue is flushed), potentially even for waits that happen after `Present()` is called.
If we want to emulate (4), I guess we'd want to do a CPU wait for a semaphore signaled by the present command we just queued.
I don't think the present command can signal anything (neither fences nor semaphores) on Vulkan. It is a one way command, with respect to timing. AFAICT once you've submitted something with `vkQueuePresentKHR()` the only way to know something about its timing is to use VK_KHR_present_wait (which is a can of worms on its own, since it requires a dedicated thread for doing that wait, but that's material for another MR). I think VK_KHR_present_wait was added precisely because of this `vkQueuePresentKHR()` deficiency.
So I am resorting to (1) for the moment because it's the only one that can be used in full generality, even if it's a rather poor approximation. Eventually VK_KHR_present_wait support should be added. Ideally my MR is designed so that shouldn't require many more changes (except for an additional thread per swapchain, unfortunately).
"When the rendering loop calls Present(), the system blocks the thread until it is done presenting a prior frame, making room to queue up the new frame, before it actually presents."
Which seems to say pretty unequivocally that native does a wait before present.
Hmm, ok. It can't understand the reason for that choice, but you're right that it's the most obvious interpretation of that sentence. I'll fix the MR.
I can't think of a way in which this difference would be actually observable, mind.
Well, I guess advanced users can see the slight latency difference between when events are processed and when the frame is presented. Or you can see that with advanced equipment, I think there are YouTube channels that care about that. It's also likely that contexts in which the developer means to care about latency do explicitly use the frame latency waitable.
Hmm, which test? I guess we do a Wait(0) and expect that to be signaled immediately, whereas waiting for the GPU to be done isn't fast enough?
I think that's likely enough to not _actually_ break something, that having the temporary todo is fine. But it also calls to question whether Wait(0) is actually the right thing.
Okay, so what I didn't notice is that the test has an additional fence wait in it, so that wait does deadlock if you use ID3D12CommandQueue::Signal() rather than ID3D12Fence::Signal().
Or could we even just swap the order?
I doubt any real program is going to be broken by splitting that commit, but in both orders you have a check that temporarily regresses. Partly because I couldn't choose between the orders, partly because the changes are somehow related, partly because they're both quite little, I ended up merging them. At any rate, I split that commit in both ways, you tell me which one you prefer: https://gitlab.winehq.org/giomasce/wine/-/commits/dxgi4 and https://gitlab.winehq.org/giomasce/wine/-/commits/dxgi5 (they both contain the same commit at the beginning to restrict testing to the test that actually matters and skip all the mode changing mess).
I don't think I have a strong preference. I guess dxgi4?
Well, before that commit it is a random moment between when `Present()` is called and when the frame is actually presented. Which moment it is depends on factors that should be irrelevant, like how many other waits there are in the queue and how they interact to each other (i.e., when the internal vkd3d queue is flushed), potentially even for waits that happen after `Present()` is called.
That's a good point, probably having it be consistent, even if slightly off, is better than not.
If we want to emulate (4), I guess we'd want to do a CPU wait for a semaphore signaled by the present command we just queued.
I don't think the present command can signal anything (neither fences nor semaphores) on Vulkan. It is a one way command, with respect to timing. AFAICT once you've submitted something with `vkQueuePresentKHR()` the only way to know something about its timing is to use VK_KHR_present_wait (which is a can of worms on its own, since it requires a dedicated thread for doing that wait, but that's material for another MR). I think VK_KHR_present_wait was added precisely because of this `vkQueuePresentKHR()` deficiency.
Bah, I forgot that :-/
Jan Sikorski (@jsikorski) commented about dlls/dxgi/swapchain.c:
if (FAILED(hr = ID3D12Device_CreateFence(device, DXGI_MAX_SWAP_CHAIN_BUFFERS,
0, &IID_ID3D12Fence, (void **)&swapchain->frame_latency_fence)))
{
WARN("Failed to create frame latency fence, hr %#lx.\n", hr);
d3d12_swapchain_destroy(swapchain);
return hr;
}
if (!(swapchain->frame_latency_event = CreateEventW(NULL, FALSE, TRUE, NULL)))
{
hr = HRESULT_FROM_WIN32(GetLastError());
WARN("Failed to create frame latency event, hr %#lx.\n", hr);
d3d12_swapchain_destroy(swapchain);
return hr;
}
- if (!(swapchain->frame_latency_semaphore = CreateSemaphoreW(NULL, 1, LONG_MAX, NULL)))
Should we create the semaphore with initial count of `swapchain->frame_latency`?