Dec. 2, 2022
8:39 p.m.
On 12/2/22 14:32, Zebediah Figura (@zfigura) wrote: > On Fri Dec 2 20:30:47 2022 +0000, **** wrote: >> Paul Gofman replied on the mailing list: >> ``` >> On 12/2/22 14:25, Gabriel Ivăncescu (@insn) wrote: >>> On Fri Dec 2 18:57:30 2022 +0000, Jacek Caban wrote: >>>>> This should help a bit more, does it make a difference for you? >>>> My previous test wasn't really good for measuring it. >>>> I hacked a micro-benchmark, which confirms that the patch improves >>>> performance a lot. It was visible when doing "real" Vulkan >>>> vkGetPhysicalDeviceProperties calls in a loop, but even cleaner when I >>>> changed it further to make Unix side to be no-op. It closes most of the >>>> gap between direct call and __wine_unix_call_dispatcher. Times recorded >>>> for no-op calls: >>>> - direct call: 5761 >>>> - unpatched Wine: 13933 >>>> - ret.diff: 6823 (55% time spent in __wine_unix_call_dispatcher, 29% in >>>> PE vkGetPhysicalDeviceProperties) >>>> Looks impressive! >>> @gofman This isn't about setting it in rcx or not, it's about >> mispairing `call`s and `ret`s, which basically means 100% mispredicted >> because CPUs are optimized for it, so it couldn't do any speculative >> execution past the return before. >> Yes, I figured that much. Yet the attached diff removes the return >> address from rcx in wine_syscall_dispatcher(), so I thought it makes >> sense to note that it will break things. >> ``` > Would it help to return to the return address already on the PE stack? > I am sorry, am not sure if I understand... help perf or help anticheat, and how return address on PE stack is related? Also note that: - ret address in rcx relates to wine_syscall_dispatcher only, not __wine_unix_call_dispatcher, while it is __wine_unix_call_dispatcher is of the performance concern here; - I guess that moving the ret address to rcx and push rcx / ret might be the same performance-wise as pushq 0x70(%rcx), ret.