On 1/22/21 20:18, Jacek Caban wrote:
Hi Paul,
I think we still support processors which don't have AVX and thus don't have xsave instruction (which is reported as a separate cpuid bit).
xsave is part of SSE2, not AVX, and it should ignore unsupported requested features, so the patch should be fine as is on hardware without AVX. xsave needs, however, to be enabled by OS, so we may need a feature check if we want to support OSes without xsave enabled.
There was a real bug with that: https://bugs.winehq.org/show_bug.cgi?id=50271.
Also, to save at least this part, it is possible to use xsavec which won't be saving anything (aside from the mask) if the ymm high part is zero (that is, in initial state, which is quite the common case when ymm regs were not used before the call; compilers even tend to reset higher part of ymm when done with them). There is user_shared_data->XState.EnabledFeatures which tells if xsave supported at all and user_shared_data->XState.CompactionEnabled tells if xsavec is available.
If I read documentation right (and my testing confirms that), xsave does what you described as well. It will not store high ymm part if it's in initial state. The difference between xsave and xsavec is about storage format, but that doesn't make a difference here. xsavec is also not exactly free: in an addition to feature check, it also requires entire xsave header to be initialized.
What seems to be more interesting is xsaveopt, which I think could make a difference. That would, however, need xsave are to be at constant address. I've been thinking about storing it next to TEB, but we can't do that as long as winsock is called on signal stack, so I left experimenting with it for the future.
xsavec also performs an optimization (doesn't save the xstate in initial state), and I think it should not depend on the save address