So, I'm trying to understand which registers we need to preserve, and I found some pages on MSDN.
register usage: https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx parameter passing: https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx
I think we need to preserve all parameter registers and can freely use any scratch registers that are not parameter registers, and that we only need to worry about the 64-bit Windows ABI.
MSDN seems to say that only XMM0 through XMM3 are used for parameter passing, and that the extra bits in YMM0 through YMM3 are scratch registers. I think that means we can ignore XMM4, XMM5, and AVX extensions.
The requirements may be different for the non-Windows ABI, but I hope we can trust the compiler to take care of that as long as we're not calling such a function directly.
I'm not sure if we can use __attribute__((force_align_arg_pointer)) directly in this way, but I will defer to others on that question.