Later patches are on https://gitlab.winehq.org/giomasce/wine/-/commits/chianti, though they still require some cleanup and cosmetic changes.
Currently the D3D12 swapchain accesses the low level Vulkan queue backing a vkd3d `ID3D12CommandQueue` using the `vkd3d_acquire_vk_queue()` call in order to submit frames for presentation. This causes a number of bugs because `vkd3d_acquire_vk_queue()` lets the caller submit to the Vulkan queue even if some operations are enqueued in the vkd3d internal op queue, which can lead to a frame being presented even if drawing work on it isn't finished (and not even queued, from the viewpoint of Vulkan!).
In order to fix this, I propose to introduce to add a few calls to vkd3d (currently [in this branch](https://gitlab.winehq.org/giomasce/vkd3d/-/commits/mongibello)), to allow the caller to enqueue calls to `vkQueueSubmit()` and `vkQueuePresentKHR()` through the vkd3d internal queue. The D3D12 swapchain would then use this new API instead of directly calling `vkQueueSubmit()` and `vkQueuePresentKHR()` as it does now. Eventually a similar call for `vkQueueWaitIdle()` should be added too. [That's already implemented in vkd3d](https://gitlab.winehq.org/giomasce/vkd3d/-/commit/b5ce799dbaf8c0fc131ee0bc24...), but not used yet in DXGI.
These changes also require introducing a worker thread for each D3D12 swapchain. The worker thread owns the Vulkan (backend) resources associated to the swapchain. Swapchain methods `Present()` and `ResizeBuffers()` do little more than pushing operations to a queue consumed by the worker thread. In this way the swapchain can immediately return on the calling thread, and the worker thread can wait on the operations submitted to the vkd3d queue, ensuring they're executed in the right order without fearing deadlocks. Notice that both `Present()` and `ResizeBuffers()` currently wait on a number of conditions (the availability of a swapchain image, its associated fence, device idleness when destroying Vulkan resources), and even if they shouldn't deadlock there could be a performance impact on the thread calling them. Since this is not a specific design objective, I didn't do any measurement, but hopefully the effect is at least nonnegative.
I considered (and partially implement) some alternative designs, but found them ultimately less satisfying:
* Just implement a generic callback in vkd3d ([in this way](https://gitlab.winehq.org/giomasce/vkd3d/-/commit/ad51ee2ad9597f2756030a83ab...)), that is queued and called with a `VkQueue` argument when dequeued. This is somewhat nicer for the vkd3d public interface, because only one call must be exposed (the one that enqueues a callback); however, DXGI would have to perform waits in the callbacks, which is a great recipe for deadlocks and for making performance harder to measure and debug.
* Use an event-based synchronization, but without a worker thread. Unfortunately that means that the swapchain can only make progress when called by the client. If for some reason a call to `Present()` must wait in the vkd3d internal queue and the client doesn't call the swapchain for some time, the frame presentation could be excessively delayed.
Patches in this MR don't implement those changes yet, but prepare the ground with some tests and light refactoring.
-- v2: dxgi: Split D3D12/Vulkan resource creation and destruction. dxgi: Pass a VkImage to d3d12_swapchain_queue_present(). dxgi: Free the frontend images memory only once. dxgi: Consider vk_format a frontend field. dxgi/tests: Test that the present count is updated when Present() is called. dxgi/tests: Test that the back buffer index is updated when Present() is called.
From: Giovanni Mascellani gmascellani@codeweavers.com
--- dlls/dxgi/tests/dxgi.c | 49 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
diff --git a/dlls/dxgi/tests/dxgi.c b/dlls/dxgi/tests/dxgi.c index d29c5577c14..f84f74398ec 100644 --- a/dlls/dxgi/tests/dxgi.c +++ b/dlls/dxgi/tests/dxgi.c @@ -4936,9 +4936,12 @@ static void test_swapchain_backbuffer_index(IUnknown *device, BOOL is_d3d12) DXGI_SWAP_CHAIN_DESC swapchain_desc; unsigned int index, expected_index; IDXGISwapChain3 *swapchain3; + ID3D12Device *d3d12_device; + ID3D12CommandQueue *queue; IDXGISwapChain *swapchain; HRESULT hr, expected_hr; IDXGIFactory *factory; + ID3D12Fence *fence; unsigned int i, j; ULONG refcount; RECT rect; @@ -5041,6 +5044,52 @@ static void test_swapchain_backbuffer_index(IUnknown *device, BOOL is_d3d12) ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); }
+ if (is_d3d12) + { + hr = IUnknown_QueryInterface(device, &IID_ID3D12CommandQueue, (void **)&queue); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + hr = ID3D12CommandQueue_GetDevice(queue, &IID_ID3D12Device, (void **)&d3d12_device); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + hr = ID3D12Device_CreateFence(d3d12_device, 0, 0, &IID_ID3D12Fence, (void **)&fence); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + hr = ID3D12CommandQueue_Wait(queue, fence, 1); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + hr = IDXGISwapChain_ResizeBuffers(swapchain, 0, + rect.right, rect.bottom, DXGI_FORMAT_UNKNOWN, swapchain_desc.Flags); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + /* The back buffer index is updated when Present() is + * called, not when frames are presented. */ + for (j = 0; j < swapchain_desc.BufferCount + 2; ++j) + { + index = IDXGISwapChain3_GetCurrentBackBufferIndex(swapchain3); + expected_index = j % swapchain_desc.BufferCount; + ok(index == expected_index, "Got back buffer index %u, expected %u.\n", index, expected_index); + hr = IDXGISwapChain3_Present(swapchain3, 0, 0); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + } + + index = IDXGISwapChain3_GetCurrentBackBufferIndex(swapchain3); + expected_index = j % swapchain_desc.BufferCount; + ok(index == expected_index, "Got back buffer index %u, expected %u.\n", index, expected_index); + + hr = ID3D12Fence_Signal(fence, 1); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + index = IDXGISwapChain3_GetCurrentBackBufferIndex(swapchain3); + expected_index = j % swapchain_desc.BufferCount; + ok(index == expected_index, "Got back buffer index %u, expected %u.\n", index, expected_index); + + refcount = ID3D12Fence_Release(fence); + ok(!refcount, "Fence has %lu references left.\n", refcount); + ID3D12Device_Release(d3d12_device); + ID3D12CommandQueue_Release(queue); + } + wait_device_idle(device);
IDXGISwapChain3_Release(swapchain3);
From: Giovanni Mascellani gmascellani@codeweavers.com
--- dlls/dxgi/tests/dxgi.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
diff --git a/dlls/dxgi/tests/dxgi.c b/dlls/dxgi/tests/dxgi.c index f84f74398ec..91e78c5a56c 100644 --- a/dlls/dxgi/tests/dxgi.c +++ b/dlls/dxgi/tests/dxgi.c @@ -7727,7 +7727,10 @@ static void test_swapchain_present_count(IUnknown *device, BOOL is_d3d12) };
UINT present_count, expected; + ID3D12Device *d3d12_device; + ID3D12CommandQueue *queue; IDXGISwapChain *swapchain; + ID3D12Fence *fence; unsigned int i; HWND window; HRESULT hr; @@ -7785,6 +7788,42 @@ static void test_swapchain_present_count(IUnknown *device, BOOL is_d3d12) ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); ok(present_count == expected, "Got unexpected present count %u, expected %u.\n", present_count, expected);
+ if (is_d3d12) + { + hr = IUnknown_QueryInterface(device, &IID_ID3D12CommandQueue, (void **)&queue); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + hr = ID3D12CommandQueue_GetDevice(queue, &IID_ID3D12Device, (void **)&d3d12_device); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + hr = ID3D12Device_CreateFence(d3d12_device, 0, 0, &IID_ID3D12Fence, (void **)&fence); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + hr = ID3D12CommandQueue_Wait(queue, fence, 1); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + /* The present count is updated when Present() is called, + * not when frames are presented. */ + hr = IDXGISwapChain_Present(swapchain, 0, 0); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + expected = present_count + 1; + hr = IDXGISwapChain_GetLastPresentCount(swapchain, &present_count); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + ok(present_count == expected, "Got unexpected present count %u, expected %u.\n", present_count, expected); + + hr = ID3D12Fence_Signal(fence, 1); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + + expected = present_count; + hr = IDXGISwapChain_GetLastPresentCount(swapchain, &present_count); + ok(hr == S_OK, "Got unexpected hr %#lx.\n", hr); + ok(present_count == expected, "Got unexpected present count %u, expected %u.\n", present_count, expected); + + ID3D12Fence_Release(fence); + ID3D12Device_Release(d3d12_device); + ID3D12CommandQueue_Release(queue); + } + IDXGISwapChain_Release(swapchain); }
From: Giovanni Mascellani gmascellani@codeweavers.com
This is an oversight in eec9c3a2f63555604dd6e357bb3280e82e14ff5b: vk_format specifies the format used for the frontend resources. The backend images could use a different format, depending on what the swapchain supports.
Also, introduce the terms "frontend" and "backend" for the two sides of the swapchain, which are probably easier to understand. --- dlls/dxgi/swapchain.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index f760e931bb1..df6eb793bbe 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -1050,21 +1050,22 @@ struct d3d12_swapchain VkDevice vk_device; VkPhysicalDevice vk_physical_device;
- /* D3D12 side of the swapchain: these objects are visible to the - * IDXGISwapChain client, so they must never be recreated, except - * when ResizeBuffers*() is called. */ + /* D3D12 side of the swapchain (frontend): these objects are + * visible to the IDXGISwapChain client, so they must never be + * recreated, except when ResizeBuffers*() is called. */ unsigned int buffer_count; VkDeviceMemory vk_memory; VkImage vk_images[DXGI_MAX_SWAP_CHAIN_BUFFERS]; ID3D12Resource *buffers[DXGI_MAX_SWAP_CHAIN_BUFFERS]; unsigned int current_buffer_index; + VkFormat vk_format;
- /* Vulkan side of the swapchain: these objects are also destroyed - * and recreated when the Vulkan swapchain becomes out of date or - * when the synchronization interval is changed; this operation - * should be transparent to the IDXGISwapChain client (except for - * timings: recreating the Vulkan swapchain creates a noticeable - * delay, unfortunately). */ + /* Vulkan side of the swapchain (backend): these objects are also + * destroyed and recreated when the Vulkan swapchain becomes out + * of date or when the synchronization interval is changed; this + * operation should be transparent to the IDXGISwapChain client + * (except for timings: recreating the Vulkan swapchain creates a + * noticeable delay, unfortunately). */ VkSwapchainKHR vk_swapchain; VkCommandPool vk_cmd_pool; VkImage vk_swapchain_images[DXGI_MAX_SWAP_CHAIN_BUFFERS]; @@ -1073,7 +1074,6 @@ struct d3d12_swapchain unsigned int vk_swapchain_width; unsigned int vk_swapchain_height; VkPresentModeKHR present_mode; - VkFormat vk_format;
uint32_t vk_image_index;
From: Giovanni Mascellani gmascellani@codeweavers.com
--- dlls/dxgi/swapchain.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index df6eb793bbe..6bb86f413de 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -1630,11 +1630,13 @@ static void d3d12_swapchain_destroy_resources(struct d3d12_swapchain *swapchain) { vk_funcs->p_vkDestroyImage(swapchain->vk_device, swapchain->vk_images[i], NULL); swapchain->vk_images[i] = VK_NULL_HANDLE; - - vk_funcs->p_vkFreeMemory(swapchain->vk_device, swapchain->vk_memory, NULL); - swapchain->vk_memory = VK_NULL_HANDLE; } } + if (swapchain->vk_device) + { + vk_funcs->p_vkFreeMemory(swapchain->vk_device, swapchain->vk_memory, NULL); + swapchain->vk_memory = VK_NULL_HANDLE; + } }
static HRESULT d3d12_swapchain_create_vulkan_swapchain(struct d3d12_swapchain *swapchain)
From: Giovanni Mascellani gmascellani@codeweavers.com
This is to eventually allow d3d12_swapchain_queue_present() to operate on a certain VkImage even if current_buffer_index has already been modified since the corresponding Present() call. --- dlls/dxgi/swapchain.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 6bb86f413de..726d57334a1 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -1980,14 +1980,13 @@ static HRESULT d3d12_swapchain_set_sync_interval(struct d3d12_swapchain *swapcha return d3d12_swapchain_create_vulkan_resources(swapchain); }
-static VkResult d3d12_swapchain_queue_present(struct d3d12_swapchain *swapchain, VkQueue vk_queue) +static VkResult d3d12_swapchain_queue_present(struct d3d12_swapchain *swapchain, VkQueue vk_queue, VkImage vk_src_image) { const struct dxgi_vk_funcs *vk_funcs = &swapchain->vk_funcs; VkPresentInfoKHR present_info; VkCommandBuffer vk_cmd_buffer; VkSubmitInfo submit_info; VkImage vk_dst_image; - VkImage vk_src_image; VkResult vr;
if (swapchain->vk_image_index == INVALID_VK_IMAGE_INDEX) @@ -2000,7 +1999,6 @@ static VkResult d3d12_swapchain_queue_present(struct d3d12_swapchain *swapchain,
vk_cmd_buffer = swapchain->vk_cmd_buffers[swapchain->vk_image_index]; vk_dst_image = swapchain->vk_swapchain_images[swapchain->vk_image_index]; - vk_src_image = swapchain->vk_images[swapchain->current_buffer_index];
if ((vr = vk_funcs->p_vkResetCommandBuffer(vk_cmd_buffer, 0)) < 0) { @@ -2076,7 +2074,8 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, return E_FAIL; }
- vr = d3d12_swapchain_queue_present(swapchain, vk_queue); + vr = d3d12_swapchain_queue_present(swapchain, vk_queue, + swapchain->vk_images[swapchain->current_buffer_index]); if (vr == VK_ERROR_OUT_OF_DATE_KHR) { vkd3d_release_vk_queue(swapchain->command_queue); @@ -2093,7 +2092,8 @@ static HRESULT d3d12_swapchain_present(struct d3d12_swapchain *swapchain, return E_FAIL; }
- if ((vr = d3d12_swapchain_queue_present(swapchain, vk_queue)) < 0) + if ((vr = d3d12_swapchain_queue_present(swapchain, vk_queue, + swapchain->vk_images[swapchain->current_buffer_index])) < 0) ERR("Failed to present after recreating swapchain, vr %d.\n", vr); }
From: Giovanni Mascellani gmascellani@codeweavers.com
This is to eventually hand frontend and backend resource management to different code pieces. --- dlls/dxgi/swapchain.c | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-)
diff --git a/dlls/dxgi/swapchain.c b/dlls/dxgi/swapchain.c index 726d57334a1..54cfa372cb2 100644 --- a/dlls/dxgi/swapchain.c +++ b/dlls/dxgi/swapchain.c @@ -1612,13 +1612,11 @@ static void d3d12_swapchain_destroy_vulkan_resources(struct d3d12_swapchain *swa } }
-static void d3d12_swapchain_destroy_resources(struct d3d12_swapchain *swapchain) +static void d3d12_swapchain_destroy_d3d12_resources(struct d3d12_swapchain *swapchain) { const struct dxgi_vk_funcs *vk_funcs = &swapchain->vk_funcs; unsigned int i;
- d3d12_swapchain_destroy_vulkan_resources(swapchain); - for (i = 0; i < swapchain->desc.BufferCount; ++i) { if (swapchain->buffers[i]) @@ -1775,7 +1773,7 @@ static HRESULT d3d12_swapchain_create_vulkan_resources(struct d3d12_swapchain *s return d3d12_swapchain_create_command_buffers(swapchain); }
-static HRESULT d3d12_swapchain_create_resources(struct d3d12_swapchain *swapchain) +static HRESULT d3d12_swapchain_create_d3d12_resources(struct d3d12_swapchain *swapchain) { HRESULT hr;
@@ -1788,10 +1786,7 @@ static HRESULT d3d12_swapchain_create_resources(struct d3d12_swapchain *swapchai if (FAILED(hr = d3d12_swapchain_create_user_buffers(swapchain))) return hr;
- if (FAILED(hr = d3d12_swapchain_create_image_resources(swapchain))) - return hr; - - return d3d12_swapchain_create_vulkan_resources(swapchain); + return d3d12_swapchain_create_image_resources(swapchain); }
static inline struct d3d12_swapchain *d3d12_swapchain_from_IDXGISwapChain4(IDXGISwapChain4 *iface) @@ -1840,7 +1835,8 @@ static void d3d12_swapchain_destroy(struct d3d12_swapchain *swapchain) const struct dxgi_vk_funcs *vk_funcs = &swapchain->vk_funcs; void *vulkan_module = vk_funcs->vulkan_module;
- d3d12_swapchain_destroy_resources(swapchain); + d3d12_swapchain_destroy_vulkan_resources(swapchain); + d3d12_swapchain_destroy_d3d12_resources(swapchain);
if (swapchain->frame_latency_event) CloseHandle(swapchain->frame_latency_event); @@ -2312,6 +2308,7 @@ static HRESULT d3d12_swapchain_resize_buffers(struct d3d12_swapchain *swapchain, DXGI_SWAP_CHAIN_DESC1 *desc, new_desc; unsigned int i; ULONG refcount; + HRESULT hr;
if (flags) FIXME("Ignoring flags %#x.\n", flags); @@ -2361,9 +2358,14 @@ static HRESULT d3d12_swapchain_resize_buffers(struct d3d12_swapchain *swapchain, && desc->Format == new_desc.Format && desc->BufferCount == new_desc.BufferCount) return S_OK;
- d3d12_swapchain_destroy_resources(swapchain); + d3d12_swapchain_destroy_vulkan_resources(swapchain); + d3d12_swapchain_destroy_d3d12_resources(swapchain); swapchain->desc = new_desc; - return d3d12_swapchain_create_resources(swapchain); + + if (FAILED(hr = d3d12_swapchain_create_d3d12_resources(swapchain))) + return hr; + + return d3d12_swapchain_create_vulkan_resources(swapchain); }
static HRESULT STDMETHODCALLTYPE d3d12_swapchain_ResizeBuffers(IDXGISwapChain4 *iface, @@ -3045,7 +3047,13 @@ static HRESULT d3d12_swapchain_init(struct d3d12_swapchain *swapchain, IWineDXGI ID3D12CommandQueue_AddRef(swapchain->command_queue = queue); ID3D12Device_AddRef(swapchain->device = device);
- if (FAILED(hr = d3d12_swapchain_create_resources(swapchain))) + if (FAILED(hr = d3d12_swapchain_create_d3d12_resources(swapchain))) + { + d3d12_swapchain_destroy(swapchain); + return hr; + } + + if (FAILED(hr = d3d12_swapchain_create_vulkan_resources(swapchain))) { d3d12_swapchain_destroy(swapchain); return hr;
I'm a bit fuzzy on the details, but wasn't there an idea to use event-based synchronization *with* a worker thread? Something like, dxgi spawns a worker thread which waits (on a GPU-signaled fence?) for the queue to be safe to directly call vkQueuePresent()?
On Fri Jun 30 06:23:40 2023 +0000, Zebediah Figura wrote:
I'm a bit fuzzy on the details, but wasn't there an idea to use event-based synchronization *with* a worker thread? Something like, dxgi spawns a worker thread which waits (on a GPU-signaled fence?) for the queue to be safe to directly call vkQueuePresent()?
I don't specifically remember discussing this, but it is certainly an option. BTW, are you proposing it because you think it would be better than the current proposal, or just as something worth considering?
So there would be an `ID3D12Fence` object, `Present()` would push a signal operation and `SetEventOnCompletion()` to it on the command queue and the worker thread would wait for that event and then acquire the `VkQueue` and enqueue the commands it needs.
The part that doesn't thrill me is the fact that DXGI still directly accesses the `VkQueue` object, which I tend to see as an internal implementation detail to vkd3d. On the other hand it's nice that we don't need new API from vkd3d (even if, in the future, we were to need something other than `vkQueueSubmit()` and `vkQueuePresentKHR()`).
Hopefully implementing this idea shouldn't require too much work, it should share a good amoung of code with my current proposal. I'll try.
On Fri Jun 30 06:23:39 2023 +0000, Giovanni Mascellani wrote:
I don't specifically remember discussing this, but it is certainly an option. BTW, are you proposing it because you think it would be better than the current proposal, or just as something worth considering? So there would be an `ID3D12Fence` object, `Present()` would push a signal operation and `SetEventOnCompletion()` to it on the command queue and the worker thread would wait for that event and then acquire the `VkQueue` and enqueue the commands it needs. The part that doesn't thrill me is the fact that DXGI still directly accesses the `VkQueue` object, which I tend to see as an internal implementation detail to vkd3d. On the other hand it's nice that we don't need new API from vkd3d (even if, in the future, we were to need something other than `vkQueueSubmit()` and `vkQueuePresentKHR()`). Hopefully implementing this idea shouldn't require too much work, it should share a good amoung of code with my current proposal. I'll try.
So in isolation I think the solution proposed by this patch set is fine, and even looks prettiest. *But* what I remember from that discussion—and Henri will undoubtedly be able to express this more eloquently—is that the concern is
(a) we might want to do something more complicated than just vkQueuePresent()—cf. the issues about fullscreen window styles / presenting to the client rect. I'm not sure but this might also turn out to be an issue for DirectComposition.
(b) we might want to plumb presentation through D3DKMTPresent() [which may be necessary to achieve the above], and it will probably not be possible to do any vkd3d calls from the Unix side.