This improves performance for the game "Grounded", on a AMD Radeon RX 6700 XT,
with radv from Mesa 22.3.6. Testing was done with the "cb_access_map_w" option
enabled, which also improves performance with the game by itself.
From my testing, it's possible to raise the threshold from 2 ms up to 5 ms or
so, before the driver or GPU seems to reclock back to the lower power level.
However, this measurement is questionable for several reasons. It seems to vary
depending on the scene being rendered, and of course this will be specific to
the game and driver and GPU in question anyway. The game also has a weird
approach to vsync that seems to involve it presenting stale frames (and hence
artificially inflating the FPS), which I'm not fully sure I accounted for while
measuring. And of course, it's hard to be sure that 5 ms is actually the
threshold for how long the driver will go before powering down the GPU. In any
case, it seems better to err on the side of submitting more often, to make sure
the fix affects more drivers.
While submission isn't cheap, it seems to me that submitting every 2 ms is
unlikely to cause a bottleneck [consider that this is at most 8 (more)
submissions per frame].
The maximum of 4 concurrent periodically submitted buffers was chosen
arbitrarily. Removing the maximum altogether does not measurably affect
performance for this game either way.
Credit goes to Philip Rebohle and his work on DXVK for helping me to notice that
periodic submission might make a difference.
--
v3: wined3d: Submit command buffers after 512 draw or dispatch commands.
wined3d: Retrieve the VkCommandBuffer from wined3d_context_vk after executing RTV barriers.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2724
It can be substantially faster in some cases. Notepad++ (with several plugins) start up goes down from around ~1.03 sec to ~0.61 sec on my machine (with wineprefix active).
9b7669592d6f8b40976b571b70f8543777d35167 is what introduced this performance regression. I don't know why exactly, but it made my Notepad++ noticeably slower when launched from a file manager (so it's not just at wineprefix startup overhead), which is very annoying.
--
v2: win32u: Cache is_virtual_desktop.
https://gitlab.winehq.org/wine/wine/-/merge_requests/5070
This seems to be relied on by some versions of [this Unreal Engine input plugin](https://www.unrealengine.com/marketplace/en-US/product/wm-input-man…
Note: I'm not sure how to deal with `HID_USAGE_GENERIC_KEYPAD`, which (I think) would fall under `RIM_TYPEKEYBOARD`. Do we need to store extra info to differentiate these from `HID_USAGE_GENERIC_KEYBOARD` or is there something in the device info struct that can differentiate them?
--
v4: user32: Post WM_INPUT_DEVICE_CHANGE when registering for notifications
user32: Add tests for WM_INPUT_DEVICE_CHANGE messages
https://gitlab.winehq.org/wine/wine/-/merge_requests/2120
--
v3: winex11: Remove now unnecessary surface wrapper struct.
win32u: Move thread detach from winex11.
win32u: Introduce a per-window vulkan surface list.
winewayland: Get rid of the now unnecessary surface wrapper.
win32u: Return the host surface directly from vulkan_surface_create.
https://gitlab.winehq.org/wine/wine/-/merge_requests/5551
ucrtbase._mbsncpy_s is used by Marvel vs Capcom when trying to create multiplayer lobby.
The functions are also present in msvcrt (unlike msvcr70, msvcr71) where I didn't add it because it behaves differently: there is at least one weirdness when it doubles the number of characters to copy ('n' parameter, not buffer size). I suppose we don't need to explore and deal with this specific until something needs those functions from msvcrt.
--
v4: msvcrt: Implement _mbsncpy_s[_l]().
https://gitlab.winehq.org/wine/wine/-/merge_requests/5547