On Wed Jan 24 16:23:35 2024 +0000, Paul Gofman wrote:
Yes, the only difference I see in this measurement is transferring a lot of data in server request (capturing context in sig_usr1 is additional and not included in the time I measured). I thought of varying the data size in context request, but the context size won't be smaller if the state is actually present. It is probably less of the issue with AVX where it is normal to clean up the state after use. But as far as I can preliminary see clearing avx512 regs after use is not a thing with AVX512 (I could not find an analogous to vzeroupper even to clear all the avx512 state without xrstor), probably once any code used that it is going to stay unclean. Also, it seems to me introducing variable context size in server context is more complicated than this patch, or am I wrong?
Isn't an analogous price for dirty AVX512 is payed on Windows then? I guess XSAVEOPT may partially mitigate that, but still.
Also, it seems to me introducing variable context size in server context is more complicated than this patch, or am I wrong?
It'd likely be more complicated. I don't know if it'd be worth it, but trading performance of one code path over another does not look ideal, so I wonder if there is a better way.