Re: [PATCH v3 0/5] MR1087: opengl32: Introduce the unixlib interface.

31 Oct 2022


      On Mon Oct 31 15:20:31 2022 +0000, Jacek Caban wrote:
...
Do you know performance impact from this? While we will need to take
some hit, it would be good to know how much and perhaps have some plans
for mitigations. We still have direct calls in winevulkan that we will
need to get rid of (either by optimizing further syscall thunks or
batching command buffers). For OpenGL, batching is more tricky and there
is hypothesis that OpenGL generally requires more calls (so more syscall
thunking). We may need to live with it, but it would be good to have
some data instead of speculations for better judgement.
Well this MR specifically should have very little effect, as it's not
going through the syscall dispatcher yet.
Anyway, I don't have much numbers, but I've run a few tests with Unigine
Valley / Heaven benchmarks, running in Low settings and 1280x720 to try
to make sure to be CPU-bound. This may not be very representative of
the varieties of games out there but it's a starting point.
With current master (avg FPS / score / perf top highest hitter CPU %):
```
* Valley GL:         167 / 6987 / ~2% in Mesa
* Valley D3D9:       129 / 5388 / ~25% in wined3d_cs_run
* Heaven GL:         319 / 8032 / ~2% in Mesa
* Heaven D3D11:      113 / 2833 / ~15% in wined3d_device_context_emit_map + ~15% in wined3d_cs_mt_finish
```
With the OpenGL32 PE conversion from https://gitlab.winehq.org/wine/wine/-/merge_requests/1010:
```
* Valley GL (PE):    147 / 6127 / ~5-10% in __wine_syscall_dispatcher
* Valley D3D9 (PE):  132 / 5520 / ~15% wined3d_cs_run + ~5-10% in __wine_syscall_dispatcher + ~5% wined3d_device_context_emit_map
* Heaven GL (PE):    263 / 6645 / ~5-10% in __wine_syscall_dispatcher
* Heaven D3D11 (PE): 112 / 2820 / ~15% in wined3d_device_context_emit_map + ~10% in wined3d_cs_emit_present + ~5-10% in __wine_syscall_dispatcher + ~5% in wined3d_cs_mt_finish
```
I also quickly checked with the WINEWOW and wow64 support, and the
results are surprisingly similar in GL mode with the win32 results,
though I'm not sure how it copes with the wow64 buffer mapping.
The wow64 D3D results are OTOH completely horrible and rendering was
broken, but that's probably because of some issues with my wow64 thunks
or caused by the buffer map copies.
FWIW I tried various tweaks to the syscall dispatcher, and all the FPU
saving modes gives roughly the same results. We can get a significant
difference by:
1) Not saving the FPU state (nop instead of xsavec reduces CPU down to
3-5%),
2) Avoid copying arguments with `rep mov`, and instead use something
like https://gitlab.winehq.org/wine/wine/-/merge_requests/1074/diffs?commit_id=bb...,
(further reduces the CPU down to 1-2%, possibly spreading it but still
improving FPS in the benchmarks).
IMHO trying to do some batching is risky, at the very least from the
latency perspective, which is something very sensitive for games. The
host graphics drivers already go to great lengths to do that kind of
thing internally in an optimal way, and I don't think we should add
another layer.
Instead I think we should have a per-thread flag indicating whether we
really need saving / restoring the FPU state entirely (or just the ABI
xmm registers). Then we should be able to enable that flag for any perf
critical and Wine-internal thread, such as the D3D ones, and provide a
custom entry point for third-party such as DXVK to do the same for their
internal threads.
If some games are actually relying on the entire FPU state being saved
and restored across syscalls, even for Wine internal threads (like if
some DRM somehow manages to check that, or when running in a debugger),
we should have a global optional flag that forces it, but it should not
be the default.
-- 
https://gitlab.winehq.org/wine/wine/-/merge_requests/1087#note_12521

2025

2024

2023

2022

Re: [PATCH v3 0/5] MR1087: opengl32: Introduce the unixlib interface.