On 1/26/21 13:10, Stefan Dösinger wrote:
Am 25.01.2021 um 22:05 schrieb Jacek Caban jacek@codeweavers.com:
It's not exactly clear to me what results you'd like to see. This is a similar operation that Windows has to do in its syscalls, so real applications already take that into account and avoid unneeded syscalls on hot paths. That leaves us with micro benchmarks. I came out with the attached benchmark, which tries to show the impact on three types of Nt* functions in Wine. It calls NtQueryInformationProcess with different arguments. Depending on the argument:
ProcessIoCounters: Wine quickly returns some data. This is a typical thing that stubs do, but some implemented functions are like that as well.
ProcessVmCounters: Wine does some stuff on client side, including Linux syscalls, to do its work.
ProcessBasicInformation: Wine uses a server call to implement it.
Here are my averaged results of a few runs, but I really don't want to read too much out of it. I originally planned to send result of a random run, but it showed that patched Wine is notably faster on server calls, so the variation was higher than the impact:
Current Wine: 310 17692 4748
Patched Wine: 2910 18243 4898
For the patched version, I used my local tree which has this series with additional runtime cpuid checks to use fxsave/xsavec/xsave depending on CPU capabilities. As expected, the impact on plain stub call is large, but compared to a real load the the impact seems marginal.
What I had in mind was running any kind of game benchmark to see if it has a noticeable impact, but I think your microbenchmark largely rules that out - thanks for looking into that. I am getting concerned that we're replacing something that used to be a regular call with a way more complicated process. Though I guess that's OK for ntdll, where applications expect expensive syscalls. We'd have to think twice about applying the same kind of syscall thunks for e.g. GL calls.
For comparison, Windows results are something like 2200, 140, 140.
I am surprised Windows makes syscalls cheaper than our original call-based Wine stub. Something seems odd.
Our functions (even stubs with debug trace) currently tend to do (often redundant, but not necessarily) save of xmm non-volatile registers, which adds up something. Also, I wouldn't be surpised if Windows has some fast syscall path which dosen't save the full context, like maybe for syscalls that simply query some information available in memory and don't expect to change task's state. Given how many syscalls Windows has and how they are used, I think it would be something natural (if not must to) have.