Yes, the only difference I see in this measurement is transferring a lot of data in server request (capturing context in sig_usr1 is additional and not included in the time I measured).
I thought of varying the data size in context request, but the context size won't be smaller if the state is actually present. It is probably less of the issue with AVX where it is normal to clean up the state after use. But as far as I can preliminary see clearing avx512 regs after use is not a thing with AVX512 (I could not find an analogous to vzeroupper even to clear all the avx512 state without xrstor), probably once any code used that it is going to stay unclean. Also, it seems to me introducing variable context size in server context is more complicated than this patch, or am I wrong?