From: Zebediah Figura zfigura@codeweavers.com
This improves performance for the game "Grounded", on a AMD Radeon RX 6700 XT, with radv from Mesa 22.3.6. Testing was done with the "cb_access_map_w" option enabled, which also improves performance with the game by itself.
From my testing, it's possible to raise the threshold from 2 ms up to 5 ms or so, before the driver or GPU seems to reclock back to the lower power level. However, this measurement is questionable for several reasons. It seems to vary depending on the scene being rendered, and of course this will be specific to the game and driver and GPU in question anyway. The game also has a weird approach to vsync that seems to involve it presenting stale frames (and hence artificially inflating the FPS), which I'm not fully sure I accounted for while measuring. And of course, it's hard to be sure that 5 ms is actually the threshold for how long the driver will go before powering down the GPU. In any case, it seems better to err on the side of submitting more often, to make sure the fix affects more drivers.
While submission isn't cheap, it seems to me that submitting every 2 ms is unlikely to cause a bottleneck [consider that this is at most 8 (more) submissions per frame].
The maximum of 4 concurrent periodically submitted buffers was chosen arbitrarily. Removing the maximum altogether does not measurably affect performance for this game either way.
Credit goes to Philip Rebohle and his work on DXVK for helping me to notice that periodic submission might make a difference. --- dlls/wined3d/adapter_vk.c | 34 ++++++++++++++++++++++++++++++++++ dlls/wined3d/context_vk.c | 2 ++ dlls/wined3d/wined3d_private.h | 1 + 3 files changed, 37 insertions(+)
diff --git a/dlls/wined3d/adapter_vk.c b/dlls/wined3d/adapter_vk.c index 5b6042659aa..1d330999b10 100644 --- a/dlls/wined3d/adapter_vk.c +++ b/dlls/wined3d/adapter_vk.c @@ -1711,6 +1711,36 @@ static void adapter_vk_flush_context(struct wined3d_context *context) wined3d_context_vk_submit_command_buffer(context_vk, 0, NULL, NULL, 0, NULL); }
+/* In general we only submit when necessary or when a frame ends. However, + * applications which do a lot of work per frame can end up with the GPU idle + * for long periods of time while the CPU is building commands, and drivers may + * choose to reclock the GPU to a lower power level if they detect it being idle + * for that long. + * + * This may also help performance simply by virtue of allowing more parallelism + * between the GPU and CPU, although no clear evidence of that has been seen + * yet. */ +static void perform_periodic_submit(struct wined3d_context_vk *context_vk) +{ + LARGE_INTEGER now, freq; + + assert(context_vk->current_command_buffer.vk_command_buffer); + + /* The point of periodic submit is to keep the GPU busy, so if it's already + * busy with 4 or more command buffers, don't submit another one now. */ + if (context_vk->current_command_buffer.id - context_vk->completed_command_buffer_id - 1 > 4) + return; + + QueryPerformanceCounter(&now); + QueryPerformanceFrequency(&freq); + + if (((now.QuadPart - context_vk->command_buffer_create_time.QuadPart) * 1000000) / freq.QuadPart >= 2000) + { + TRACE("Submitting command buffer because it's been 2 ms since it was created.\n"); + wined3d_context_vk_submit_command_buffer(context_vk, 0, NULL, NULL, 0, NULL); + } +} + static void adapter_vk_draw_primitive(struct wined3d_device *device, const struct wined3d_state *state, const struct wined3d_draw_parameters *parameters) { @@ -1796,6 +1826,8 @@ static void adapter_vk_draw_primitive(struct wined3d_device *device, context_vk->c.transform_feedback_active = 0; }
+ perform_periodic_submit(context_vk); + context_release(&context_vk->c); }
@@ -1840,6 +1872,8 @@ static void adapter_vk_dispatch_compute(struct wined3d_device *device, VK_CALL(vkCmdPipelineBarrier(vk_command_buffer, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT, 0, 0, NULL, 0, NULL, 0, NULL));
+ perform_periodic_submit(context_vk); + context_release(&context_vk->c); }
diff --git a/dlls/wined3d/context_vk.c b/dlls/wined3d/context_vk.c index 10713985e7e..d4cf48ac579 100644 --- a/dlls/wined3d/context_vk.c +++ b/dlls/wined3d/context_vk.c @@ -1767,6 +1767,8 @@ VkCommandBuffer wined3d_context_vk_get_command_buffer(struct wined3d_context_vk wined3d_query_vk_resume(query_vk, context_vk); }
+ QueryPerformanceCounter(&context_vk->command_buffer_create_time); + TRACE("Created new command buffer %p with id 0x%s.\n", buffer->vk_command_buffer, wine_dbgstr_longlong(buffer->id));
diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h index f793c6c0cde..f13d8eeca48 100644 --- a/dlls/wined3d/wined3d_private.h +++ b/dlls/wined3d/wined3d_private.h @@ -2722,6 +2722,7 @@ struct wined3d_context_vk struct wined3d_command_buffer_vk current_command_buffer; uint64_t completed_command_buffer_id; VkDeviceSize retired_bo_size; + LARGE_INTEGER command_buffer_create_time;
struct {