Re: [PATCH v2 0/1] MR1552: winevulkan: Use direct calls for vkGetDescriptorEXT.

Dec. 2, 2022

      On 12/2/22 14:39, Paul Gofman wrote:
> On 12/2/22 14:32, Zebediah Figura (@zfigura) wrote:
>> On Fri Dec  2 20:30:47 2022 +0000, **** wrote:
>>> Paul Gofman replied on the mailing list:
>>> ```
>>> On 12/2/22 14:25, Gabriel Ivăncescu (@insn) wrote:
>>>> On Fri Dec  2 18:57:30 2022 +0000, Jacek Caban wrote:
>>>>>> This should help a bit more, does it make a difference for you?
>>>>> My previous test wasn't really good for measuring it.
>>>>> I hacked a micro-benchmark, which confirms that the patch improves
>>>>> performance a lot. It was visible when doing "real" Vulkan
>>>>> vkGetPhysicalDeviceProperties calls in a loop, but even cleaner when I
>>>>> changed it further to make Unix side to be no-op. It closes most of 
>>>>> the
>>>>> gap between direct call and __wine_unix_call_dispatcher. Times 
>>>>> recorded
>>>>> for no-op calls:
>>>>> - direct call: 5761
>>>>> - unpatched Wine: 13933
>>>>> - ret.diff: 6823 (55% time spent in __wine_unix_call_dispatcher, 
>>>>> 29% in
>>>>> PE vkGetPhysicalDeviceProperties)
>>>>> Looks impressive!
>>>> @gofman This isn't about setting it in rcx or not, it's about
>>> mispairing `call`s and `ret`s, which basically means 100% mispredicted
>>> because CPUs are optimized for it, so it couldn't do any speculative
>>> execution past the return before.
>>> Yes, I figured that much. Yet the attached diff removes the return
>>> address from rcx in wine_syscall_dispatcher(), so I thought it makes
>>> sense to note that it will break things.
>>> ```
>> Would it help to return to the return address already on the PE stack?
>>
> I am sorry, am not sure if I understand... help perf or help anticheat, 
> and how return address on PE stack is related? Also note that:
> 
> - ret address in rcx relates to wine_syscall_dispatcher only, not 
> __wine_unix_call_dispatcher, while it is __wine_unix_call_dispatcher is 
> of the performance concern here;
> 
> - I guess that moving the ret address to rcx and push rcx / ret might be 
> the same performance-wise as pushq 0x70(%rcx), ret.
> 
> 
> 

I mean something like the attached patch. I don't know enough about 
modern x86 optimization to know if it would help, but it seems like it 
would at least avoid a memory access?