On Fri, 29 Apr 2022 at 15:21, Conor McCarthy cmccarthy@codeweavers.com wrote:
Correct fence behaviour requires:
Map monotonically increasing timeline values to fence virtual values to avoid invalid use of Vulkan timeline semaphores. In particular, non- increasing values and value jumps of >= 4G are required for d3d12.
Create a worker thread for each queue to handle queue commands. This allows blocking of wait submission until an unblocking signal is submitted, so out-of-order waits are handled correctly when two d3d12 queues are mapped to the same Vulkan queue.
Threaded queue submission also fixes the old fence implementation so it is fully functional, though a bit less efficient than timeline semaphores.
Based in part on vkd3d-proton patches by Hans-Kristian Arntzen.
Signed-off-by: Conor McCarthy cmccarthy@codeweavers.com
v2: Always broadcast on the null event condition after signaling.
Threaded command queues on their own seem to bring out the problems with the existing fence implementation, which results in games crashing. Combining them with the fence changes avoids that issue.
Is there any reason we couldn't apply the fence changes before the command queue submission changes?
The commit message could be clearer, but if I understood it correctly, the command queue submission changes are intended to address the following scenario:
- We have a fence F1, and two d3d12 command queues V1 and V2 sharing the same physical Vulkan queue Q1.
- The application submits a Wait(F1, x) on V1.
- The application submits a Signal(F1, x) on V2.
- The Wait() on V1 never completes, because it's submitted to Q1 before the Signal() on V2.
- And to complicate things some more, V2 may not be created until after the Wait().
Correct? That explains the need to delay Wait() submission, but not the need to introduce the worker threads. Is there anything preventing us from buffering these Wait() calls (and any subsequent Execute() calls) much like we do in this patch, and then simply submitting them when the corresponding Signal() is submitted?
If we treat the need for these worker threads as a given for the sake of argument, the scheme introduced in this patch looks suspiciously similar to wined3d's command stream. The wined3d CS is probably a fair bit more complicated than what's needed here, but could we please at least keep the terminology consistent?
d3d12_command_queue_acquire_serialised() is going to drain (i.e., stall) the queue on each vkd3d_acquire_vk_queue() call. That seems undesirable by itself, but in the case of Wine's dxgi/d3d12, that's going to happen on every Present() call. Perhaps worse, is there anything preventing an application from attempting to drain the queue (e.g. by calling vkd3d_acquire_vk_queue()) between a Wait() and its corresponding Signal()?