> Would it help to return to the return address already on the PE stack?
Are we sure it's never clobbered?
> I guess that moving the ret address to rcx and push rcx / ret might be
the same performance-wise as pushq 0x70(%rcx), ret.
Yes, skipping rcx save will break existing tests.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1552#note_18485
On Fri Dec 2 20:30:47 2022 +0000, **** wrote:
> Paul Gofman replied on the mailing list:
> ```
> On 12/2/22 14:25, Gabriel Ivăncescu (@insn) wrote:
> > On Fri Dec 2 18:57:30 2022 +0000, Jacek Caban wrote:
> >>> This should help a bit more, does it make a difference for you?
> >> My previous test wasn't really good for measuring it.
> >> I hacked a micro-benchmark, which confirms that the patch improves
> >> performance a lot. It was visible when doing "real" Vulkan
> >> vkGetPhysicalDeviceProperties calls in a loop, but even cleaner when I
> >> changed it further to make Unix side to be no-op. It closes most of the
> >> gap between direct call and __wine_unix_call_dispatcher. Times recorded
> >> for no-op calls:
> >> - direct call: 5761
> >> - unpatched Wine: 13933
> >> - ret.diff: 6823 (55% time spent in __wine_unix_call_dispatcher, 29% in
> >> PE vkGetPhysicalDeviceProperties)
> >> Looks impressive!
> > @gofman This isn't about setting it in rcx or not, it's about
> mispairing `call`s and `ret`s, which basically means 100% mispredicted
> because CPUs are optimized for it, so it couldn't do any speculative
> execution past the return before.
> >
> Yes, I figured that much. Yet the attached diff removes the return
> address from rcx in wine_syscall_dispatcher(), so I thought it makes
> sense to note that it will break things.
> ```
Would it help to return to the return address already on the PE stack?
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1552#note_18480