On Wed Jan 24 16:23:35 2024 +0000, Jacek Caban wrote:
Isn't an analogous price for dirty AVX512 is payed on Windows then? I guess XSAVEOPT may partially mitigate that, but still.
Also, it seems to me introducing variable context size in server
context is more complicated than this patch, or am I wrong? It'd likely be more complicated. I don't know if it'd be worth it, but trading performance of one code path over another does not look ideal, so I wonder if there is a better way.
I am not sure if Windows actually saves the full state on every syscall. In theory that can be avoided there if the kernel part doesn't clobber avx512 state (and only saves parts of the state when it is needed) not sure if I can establish that for sure (but I guess more likely it saves that in full at once?). What I tested so far is that some arbitrary system calls do not reset avx512 state. Maybe in theory we can go for some optimization here and reset the state at some syscalls (as the state is volatile and syscall ABI doesn't demand it to be preserved), even if that doesn't match Windows. But here is the risk that we will encounter something depending on that to behave like Windows.
I am not sure how xsaveopt can help? It is an older analogue of xsavec which we already use. xsavec is like xsaveopt in terms of not saving the state which is in INIT state, just additionally uses compacted save area layout. Any of those do help when we have cleared state, but if avx512 regs are not zero they will have to save anyway.
but trading performance of one code path over another does not look ideal, so I wonder if there is a better way.
Sure, but if the absence of those can't we trade a smaller perf drop in rare calls into bigger gain in very frequent? As the only alternative approach I can see so far, do you think we can find some other signal to deliver the request to do server_select with context?