If after a unix call `frame->restore_flags` was not 0 but did not include either `CONTEXT_FLOATING_POINT` or `CONTEXT_XSTATE`, xmm6-xmm15 were not restored to their previous values.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1772
This removes 20 `movaps` instructions from every syscall that calls a sysv_abi function, plus an `and` for stack alignment and some other instructions depending on the function.
In `NtAllocateLocallyUniqueId` for example this reduces the number of instructions from 63 to 36.
I don't entirely understand the llvm-mca output but here are the before and after stats that it outputs for that function:
Before
Iterations: 100
Instructions: 6300
Total Cycles: 3335
Total uOps: 6300
Dispatch Width: 6
uOps Per Cycle: 1.89
IPC: 1.89
Block RThroughput: 15.0
After
Iterations: 100
Instructions: 3600
Total Cycles: 1514
Total uOps: 3600
Dispatch Width: 6
uOps Per Cycle: 2.38
IPC: 2.38
Block RThroughput: 6.0
This currently depends on the stack being aligned by the syscall dispatcher, which afaict is the case if `sizeof(struct syscall_frame) % 16 == 0`. If that is not good enough I can add an `andq $~15,%rsp` somewhere.
One question I have is whether we want to continue supporting CDECL syscalls (only `wine_server_call`, `wine_server_fd_to_handle` and `wine_server_handle_to_fd`)?
If we do, this adds a bit of complexity to the syscall dispatcher, see the commit "FIXUP ntdll: Support CDECL syscalls."
If we don't, and make those syscalls WINAPI instead, then for every call to those functions on x86 it seems to either change nothing or add one `add` instruction. However we of course lose the ability to make CDECL syscalls.
--
v2: Revert "ntdll: Make CDECL syscalls WINAPI instead."
FIXUP ntdll: Support CDECL syscalls.
ntdll: Make syscall functions sysv_abi on x64.
ntdll: Make CDECL syscalls WINAPI instead.
win32u: Make syscalls use the SYSCALL calling convention.
ntdll: Make syscalls use the SYSCALL calling convention.
include: Add SYSCALL calling convention.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1752