I'm not completely sure about the mechanism, but I think it's simple enough to consider having that upstream now. This shows at least how we can leverage win32u surface changes to decide to switch surfaces on/off-screen and fallback to manual blitting.
Having the surfaces on-screen makes sure they are presented as efficiently as possible, having them off-screen we use GDI blit to indirectly call XCopyArea and this will be suboptimal, probably with visible tearing, but hopefully not too common.
--
v2: win32u: Use GDI blit to implement partial or other process presentation.
winex11: Return an offscreen HDC from vulkan_surface_detach.
win32u: Pass a HDC parameter to vulkan_surface_detach.
winex11: Create a window surface even when there is client window.
winex11: Also attach child client windows to their toplevel window.
win32u: Make sure vulkan windows have a pixel format selected.
win32u: Detach vulkan surfaces that aren't fully visible.
win32u: Detach vulkan surfaces that are offscreen or for another process.
win32u: Move vulkan surfaces to their new parent when reparenting.
win32u: Introduce a new vulkan offscreen surfaces list.
https://gitlab.winehq.org/wine/wine/-/merge_requests/5573
Reduces the time needed to write them from 20ms to 2ms and to read them from 10ms to 1ms, as well as reducing the number of wineserver requests by a lot.
--
v2: win32u: Read / write source modes as a single registry blob.
winex11: Only request display modes driver data when needed.
https://gitlab.winehq.org/wine/wine/-/merge_requests/5579
This improves performance for the game "Grounded", on a AMD Radeon RX 6700 XT,
with radv from Mesa 22.3.6. Testing was done with the "cb_access_map_w" option
enabled, which also improves performance with the game by itself.
From my testing, it's possible to raise the threshold from 2 ms up to 5 ms or
so, before the driver or GPU seems to reclock back to the lower power level.
However, this measurement is questionable for several reasons. It seems to vary
depending on the scene being rendered, and of course this will be specific to
the game and driver and GPU in question anyway. The game also has a weird
approach to vsync that seems to involve it presenting stale frames (and hence
artificially inflating the FPS), which I'm not fully sure I accounted for while
measuring. And of course, it's hard to be sure that 5 ms is actually the
threshold for how long the driver will go before powering down the GPU. In any
case, it seems better to err on the side of submitting more often, to make sure
the fix affects more drivers.
While submission isn't cheap, it seems to me that submitting every 2 ms is
unlikely to cause a bottleneck [consider that this is at most 8 (more)
submissions per frame].
The maximum of 4 concurrent periodically submitted buffers was chosen
arbitrarily. Removing the maximum altogether does not measurably affect
performance for this game either way.
Credit goes to Philip Rebohle and his work on DXVK for helping me to notice that
periodic submission might make a difference.
--
v3: wined3d: Submit command buffers after 512 draw or dispatch commands.
wined3d: Retrieve the VkCommandBuffer from wined3d_context_vk after executing RTV barriers.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2724