On 12/10/21 17:00, RĂ©mi Bernon wrote:
I'm not completely aware of why the full FPU context needs to be saved and restored for instance, but if it's only for the debugging experience, could that simply be stripped in such builds?
If you plainly just strip saving FPU context you will get NtGetContextThread broken without any debugger involved. I believe restoring the FPU context is optimized out already for the (majority of) cases when it is not needed due to setting FPU context for thread. Of course one can hack something around and then enable only for games which really need that, after spending time finding out that they do.
If we talk about AVX (ymm) registers with xsavec support that is only actually saved if there are nonzero registers YMM registers upon the call. And given those registers are volatile compilers tend to often do vzeroupper before function calls as far as I could see (probably exactly to avoid context saving overhead on the syscalls otherwise present both under Windows and Linux without Wine involved).
Then, there is a part of non-volatile ms_abi XMM registers which are volatile on sysv_abi and those are allways saved in compiler generated prologue once ms_abi function calls sysv_abi. So going through Wine->Unix gate just changes the place where those are saved and in general a clear PE - unix part separation should be removing a great amount of these saves across function calls.
The part which stays excessive is volatile XMM register saves, but that is probably relatively minor and might be subject for fine but ugly optimization if we ever to introduce a "lightweight" dispatcher for non-blocking call. But I'd expect this overhead to be less than what we gain over the split for removing extra non-volatile XMM register saves.