From: Zebediah Figura zfigura@codeweavers.com
This improves performance for the game "Grounded", on a AMD Radeon RX 6700 XT, with radv from Mesa 22.3.6. Testing was done with the "cb_access_map_w" option enabled, which also improves performance with the game by itself.
From my testing, it's possible to raise the threshold from 2 ms up to 5 ms or so, before the driver or GPU seems to reclock back to the lower power level. However, this measurement is questionable for several reasons. It seems to vary depending on the scene being rendered, and of course this will be specific to the game and driver and GPU in question anyway. The game also has a weird approach to vsync that seems to involve it presenting stale frames (and hence artificially inflating the FPS), which I'm not fully sure I accounted for while measuring. And of course, it's hard to be sure that 5 ms is actually the threshold for how long the driver will go before powering down the GPU. In any case, it seems better to err on the side of submitting more often, to make sure the fix affects more drivers.
While submission isn't cheap, it seems to me that submitting every 2 ms is unlikely to cause a bottleneck [consider that this is at most 8 (more) submissions per frame].
The maximum of 4 concurrent periodically submitted buffers was chosen arbitrarily. Removing the maximum altogether does not measurably affect performance for this game either way.
Credit goes to Philip Rebohle and his work on DXVK for helping me to notice that periodic submission might make a difference. --- dlls/wined3d/context_vk.c | 41 ++++++++++++++++++++++++++++++++++++++- dlls/wined3d/wined3d_vk.h | 1 + 2 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/dlls/wined3d/context_vk.c b/dlls/wined3d/context_vk.c index 9ed5b46ba96..d54fe9ad26e 100644 --- a/dlls/wined3d/context_vk.c +++ b/dlls/wined3d/context_vk.c @@ -1771,6 +1771,43 @@ void wined3d_context_vk_cleanup(struct wined3d_context_vk *context_vk) wined3d_context_cleanup(&context_vk->c); }
+/* In general we only submit when necessary or when a frame ends. However, + * applications which do a lot of work per frame can end up with the GPU idle + * for long periods of time while the CPU is building commands, and drivers may + * choose to reclock the GPU to a lower power level if they detect it being idle + * for that long. + * + * This may also help performance simply by virtue of allowing more parallelism + * between the GPU and CPU, although no clear evidence of that has been seen + * yet. */ + +#define WINED3D_PERIODIC_SUBMIT_INTERVAL_MICROSECONDS 2000 +#define WINED3D_PERIODIC_SUBMIT_MAX_BUFFERS 4 + +static bool should_periodic_submit(struct wined3d_context_vk *context_vk) +{ + LARGE_INTEGER now, freq; + uint64_t busy_count; + ULONGLONG diff; + + /* The point of periodic submit is to keep the GPU busy, so if it's already + * busy with 4 or more command buffers, don't submit another one now. */ + busy_count = context_vk->current_command_buffer.id - context_vk->completed_command_buffer_id - 1; + if (busy_count > WINED3D_PERIODIC_SUBMIT_MAX_BUFFERS) + return false; + + QueryPerformanceCounter(&now); + QueryPerformanceFrequency(&freq); + + diff = ((now.QuadPart - context_vk->command_buffer_create_time.QuadPart) * 1000000) / freq.QuadPart; + if (diff < WINED3D_PERIODIC_SUBMIT_INTERVAL_MICROSECONDS) + return false; + + TRACE("Periodically submitting command buffer, %I64u us since last buffer, %I64u currently busy.\n", + diff, busy_count); + return true; +} + VkCommandBuffer wined3d_context_vk_get_command_buffer(struct wined3d_context_vk *context_vk) { struct wined3d_device_vk *device_vk = wined3d_device_vk(context_vk->c.device); @@ -1785,7 +1822,7 @@ VkCommandBuffer wined3d_context_vk_get_command_buffer(struct wined3d_context_vk buffer = &context_vk->current_command_buffer; if (buffer->vk_command_buffer) { - if (context_vk->retired_bo_size > WINED3D_RETIRED_BO_SIZE_THRESHOLD) + if (context_vk->retired_bo_size > WINED3D_RETIRED_BO_SIZE_THRESHOLD || should_periodic_submit(context_vk)) wined3d_context_vk_submit_command_buffer(context_vk, 0, NULL, NULL, 0, NULL); else { @@ -1854,6 +1891,8 @@ VkCommandBuffer wined3d_context_vk_get_command_buffer(struct wined3d_context_vk wined3d_query_vk_resume(query_vk, context_vk); }
+ QueryPerformanceCounter(&context_vk->command_buffer_create_time); + TRACE("Created new command buffer %p with id 0x%s.\n", buffer->vk_command_buffer, wine_dbgstr_longlong(buffer->id));
diff --git a/dlls/wined3d/wined3d_vk.h b/dlls/wined3d/wined3d_vk.h index e995ef3c408..fe4f96cfd57 100644 --- a/dlls/wined3d/wined3d_vk.h +++ b/dlls/wined3d/wined3d_vk.h @@ -599,6 +599,7 @@ struct wined3d_context_vk struct wined3d_command_buffer_vk current_command_buffer; uint64_t completed_command_buffer_id; VkDeviceSize retired_bo_size; + LARGE_INTEGER command_buffer_create_time;
struct {