Re: [PATCH v2 0/1] MR1552: winevulkan: Use direct calls for vkGetDescriptorEXT.
Dec. 2, 2022
8:32 p.m.
On Fri Dec 2 20:30:47 2022 +0000, **** wrote: > Paul Gofman replied on the mailing list: > ``` > On 12/2/22 14:25, Gabriel Ivăncescu (@insn) wrote: > > On Fri Dec 2 18:57:30 2022 +0000, Jacek Caban wrote: > >>> This should help a bit more, does it make a difference for you? > >> My previous test wasn't really good for measuring it. > >> I hacked a micro-benchmark, which confirms that the patch improves > >> performance a lot. It was visible when doing "real" Vulkan > >> vkGetPhysicalDeviceProperties calls in a loop, but even cleaner when I > >> changed it further to make Unix side to be no-op. It closes most of the > >> gap between direct call and __wine_unix_call_dispatcher. Times recorded > >> for no-op calls: > >> - direct call: 5761 > >> - unpatched Wine: 13933 > >> - ret.diff: 6823 (55% time spent in __wine_unix_call_dispatcher, 29% in > >> PE vkGetPhysicalDeviceProperties) > >> Looks impressive! > > @gofman This isn't about setting it in rcx or not, it's about > mispairing `call`s and `ret`s, which basically means 100% mispredicted > because CPUs are optimized for it, so it couldn't do any speculative > execution past the return before. > > > Yes, I figured that much. Yet the attached diff removes the return > address from rcx in wine_syscall_dispatcher(), so I thought it makes > sense to note that it will break things. > ``` Would it help to return to the return address already on the PE stack? -- https://gitlab.winehq.org/wine/wine/-/merge_requests/1552#note_18480
December 2022
8:39 p.m.
New subject: [PATCH v2 0/1] MR1552: winevulkan: Use direct calls for vkGetDescriptorEXT.
On 12/2/22 14:32, Zebediah Figura (@zfigura) wrote: > On Fri Dec 2 20:30:47 2022 +0000, **** wrote: >> Paul Gofman replied on the mailing list: >> ``` >> On 12/2/22 14:25, Gabriel Ivăncescu (@insn) wrote: >>> On Fri Dec 2 18:57:30 2022 +0000, Jacek Caban wrote: >>>>> This should help a bit more, does it make a difference for you? >>>> My previous test wasn't really good for measuring it. >>>> I hacked a micro-benchmark, which confirms that the patch improves >>>> performance a lot. It was visible when doing "real" Vulkan >>>> vkGetPhysicalDeviceProperties calls in a loop, but even cleaner when I >>>> changed it further to make Unix side to be no-op. It closes most of the >>>> gap between direct call and __wine_unix_call_dispatcher. Times recorded >>>> for no-op calls: >>>> - direct call: 5761 >>>> - unpatched Wine: 13933 >>>> - ret.diff: 6823 (55% time spent in __wine_unix_call_dispatcher, 29% in >>>> PE vkGetPhysicalDeviceProperties) >>>> Looks impressive! >>> @gofman This isn't about setting it in rcx or not, it's about >> mispairing `call`s and `ret`s, which basically means 100% mispredicted >> because CPUs are optimized for it, so it couldn't do any speculative >> execution past the return before. >> Yes, I figured that much. Yet the attached diff removes the return >> address from rcx in wine_syscall_dispatcher(), so I thought it makes >> sense to note that it will break things. >> ``` > Would it help to return to the return address already on the PE stack? > I am sorry, am not sure if I understand... help perf or help anticheat, and how return address on PE stack is related? Also note that: - ret address in rcx relates to wine_syscall_dispatcher only, not __wine_unix_call_dispatcher, while it is __wine_unix_call_dispatcher is of the performance concern here; - I guess that moving the ret address to rcx and push rcx / ret might be the same performance-wise as pushq 0x70(%rcx), ret.
8:54 p.m.
New subject: [PATCH v2 0/1] MR1552: winevulkan: Use direct calls for vkGetDescriptorEXT.
On 12/2/22 14:39, Paul Gofman wrote: > On 12/2/22 14:32, Zebediah Figura (@zfigura) wrote: >> On Fri Dec 2 20:30:47 2022 +0000, **** wrote: >>> Paul Gofman replied on the mailing list: >>> ``` >>> On 12/2/22 14:25, Gabriel Ivăncescu (@insn) wrote: >>>> On Fri Dec 2 18:57:30 2022 +0000, Jacek Caban wrote: >>>>>> This should help a bit more, does it make a difference for you? >>>>> My previous test wasn't really good for measuring it. >>>>> I hacked a micro-benchmark, which confirms that the patch improves >>>>> performance a lot. It was visible when doing "real" Vulkan >>>>> vkGetPhysicalDeviceProperties calls in a loop, but even cleaner when I >>>>> changed it further to make Unix side to be no-op. It closes most of >>>>> the >>>>> gap between direct call and __wine_unix_call_dispatcher. Times >>>>> recorded >>>>> for no-op calls: >>>>> - direct call: 5761 >>>>> - unpatched Wine: 13933 >>>>> - ret.diff: 6823 (55% time spent in __wine_unix_call_dispatcher, >>>>> 29% in >>>>> PE vkGetPhysicalDeviceProperties) >>>>> Looks impressive! >>>> @gofman This isn't about setting it in rcx or not, it's about >>> mispairing `call`s and `ret`s, which basically means 100% mispredicted >>> because CPUs are optimized for it, so it couldn't do any speculative >>> execution past the return before. >>> Yes, I figured that much. Yet the attached diff removes the return >>> address from rcx in wine_syscall_dispatcher(), so I thought it makes >>> sense to note that it will break things. >>> ``` >> Would it help to return to the return address already on the PE stack? >> > I am sorry, am not sure if I understand... help perf or help anticheat, > and how return address on PE stack is related? Also note that: > > - ret address in rcx relates to wine_syscall_dispatcher only, not > __wine_unix_call_dispatcher, while it is __wine_unix_call_dispatcher is > of the performance concern here; > > - I guess that moving the ret address to rcx and push rcx / ret might be > the same performance-wise as pushq 0x70(%rcx), ret. > > > I mean something like the attached patch. I don't know enough about modern x86 optimization to know if it would help, but it seems like it would at least avoid a memory access?
1210
Age (days ago)
1210
Last active (days ago)
2 comments
3 participants
participants (3)
-
Paul Gofman -
Zebediah Figura -
Zebediah Figura (@zfigura)