Some tests fail for me in VirtualBox running on AMD CPU:
Hmm... weird. The tests probably need some work, but I think there's some clues that it should be possible.
But yes, if we can skip full context store, it would be nice. I've been thinking about skipping it for `__wine_unix_call` syscall, but skipping it for more syscalls would be even nicer.
Yes, we could have a special and lighter dispatcher for `__wine_unix_call`, eventually with an option to make it a full dispatcher if need be. I have found already some games with bad performance hit with the GL conversion.
---
FWIW regarding the dispatcher overhead, I have noted a few things in addition to the the FPU state that could be nice to keep in mind for a lighter dispatcher:
If we save the FPU state partially, the next hurting things would be xsave.MxCsr / xsave.ControlWord, I'm not sure if we need to save those; and I'm having trouble with them for some reason.
Then next overhead comes from the save and restore of rflags. As far as I could see syscalls are not keeping all the flags untouched (obviously, as they still do a few comparisons), but some (NT, ID, DF) seem to be saved and restored by NtDelayExecution. I'm sure at least NT flag was causing some issues with some applications.
It's not much overhead but I think pushf disrupts the CPU pipeline and skipping those could be nice. I have some ideas to take a few shortcuts and avoid popf, but I don't see how to avoid the pushf to read the flags. If we can be sure nothing will rely on them for `__wine_unix_call` maybe we can simply zero the flags.
The 32-bit dispatcher also suffers from `rep movsl`, copying a fixed number of arguments with `pushl` instead and falling back to `rep movsl` when there is more seem to make a good difference.
Then `__wine_unix_call` as a function also has a bit of overhead as it saves frame pointer (and currently XMM registers), where it could just be `movq %r8,%rdi; jmp *(%rcx,%rdx,8)`.
Last I think, on the PE side, having the `__wine_unix_call` import in `winecrt0` also hurts a bit with some unnecessary branches and indirection. I have opened https://gitlab.winehq.org/wine/wine/-/merge_requests/1201 for that.