On Sun Dec 11 07:55:07 2022 +0000, Torge Matthies wrote:
Actually this doesn't restore xmm6-xmm15, which it should according to ms abi, and forcing the `xrstor64` results in lower performance than before.
Manually restoring only xmm6-xmm15, and using the values saved by the (f)xsave(c)64, together makes this actually faster than upstream, while adhering to the ABI. See the latest version of this branch. Not by much though, only about 3.5% in my test case (a syscall that just calls a sysv_abi function), and is probably slower for syscalls that don't call any sysv_abi function (if there is any syscall like that).