 
            So, I'm trying to understand which registers we need to preserve, and I found some pages on MSDN.
register usage: https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx parameter passing: https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx
I think we need to preserve all parameter registers and can freely use any scratch registers that are not parameter registers, and that we only need to worry about the 64-bit Windows ABI.
MSDN seems to say that only XMM0 through XMM3 are used for parameter passing, and that the extra bits in YMM0 through YMM3 are scratch registers. I think that means we can ignore XMM4, XMM5, and AVX extensions.
The requirements may be different for the non-Windows ABI, but I hope we can trust the compiler to take care of that as long as we're not calling such a function directly.
I'm not sure if we can use __attribute__((force_align_arg_pointer)) directly in this way, but I will defer to others on that question.
 
            On 01/22/2016 10:46 PM, Vincent Povirk wrote:
So, I'm trying to understand which registers we need to preserve, and I found some pages on MSDN.
register usage: https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx parameter passing: https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx
I used the same for reference
I think we need to preserve all parameter registers and can freely use any scratch registers that are not parameter registers, and that we only need to worry about the 64-bit Windows ABI.
That's the same what I had in mind.
MSDN seems to say that only XMM0 through XMM3 are used for parameter passing, and that the extra bits in YMM0 through YMM3 are scratch registers. I think that means we can ignore XMM4, XMM5, and AVX extensions.
That's not quite true for __vectorcall convention. From your first link for XMM4,YMM4 for instance: "Must be preserved as needed by caller; *fifth vector-type argument when __vectorcall is used*". So to support all the possible cases we need to preserve all of them I suppose. That's why I added this mess with AVX detection and 2nd thunk version.
The requirements may be different for the non-Windows ABI, but I hope we can trust the compiler to take care of that as long as we're not calling such a function directly.
GCC must save non-volatile regs. Just as paranoid check I observed that in disassembly of my build. It does that, including XMM non-volatile regs. I suppose a lot of things would be broken in Wine if ms-abi in gcc was buggy this way.
I'm not sure if we can use __attribute__((force_align_arg_pointer)) directly in this way, but I will defer to others on that question.
What problem do you see with it? I know just one which I pointed in my comment:it may not work in the older compiler (it works in my gcc 5.3 from Fedora 23 though). I just found related discussion for it in Wine: https://bugs.winehq.org/show_bug.cgi?id=27680. Please note that if this attribute does not work for older compiler, we will still get the crash only in case of buggy Win app call (just the same way some non .Net apps already crashing due to this bug in older compilers). The attribute does work for me with gcc 5.3.1 20151207. When I remove the 8 bytes from my esp math (sub $0xE0,esp instead of sub $0xE8) for testing purpose, and also remove force_align_arg_pointer, I do get the crash on my machine on first movaps in ReallyFixupVTable. When I leave thunk stack without padding, but add force_align_arg_pointer, there is no crash, and I see forced stack pointer alignment in ReallyFixupVTable prolog. I also checked of course that removing the attribute does not cause the crash in my and your test case (both provide esp properly aligned).
Or do you mean that we must use some define instead? I searched the use of force_align_arg_pointer through all the code, it seems like it is not defined in any constant for x86_64 for now.
 
            That's not quite true for __vectorcall convention. From your first link for XMM4,YMM4 for instance: "Must be preserved as needed by caller; *fifth vector-type argument when __vectorcall is used*". So to support all the possible cases we need to preserve all of them I suppose. That's why I added this mess with AVX detection and 2nd thunk version.
OK, I missed that.
But I don't think it's possible for a .NET method to use __vectorcall as its calling convention. The attribute used to specify it doesn't have a value for this: https://msdn.microsoft.com/en-us/library/system.runtime.interopservices.unma...
It's possible MS will extend this in the future, but it's also possible they'll make up entirely new calling conventions. I don't think they will, and for now I don't think it's worth the extra complexity.
The only potential problem I see with __attribute__((force_align_arg_pointer)) is that we may want to use compilers other than GCC (such as clang or msvc), and they may not support this attribute. I did see it used in some defines on the Mac (because it has stricter alignment requirements than Windows), so I guess it works in clang.
How likely do you think it is that there's code out there that calls a .NET method with wrong alignment? Maybe we don't need to worry about this.
I only took a quick look and could be wrong, but from looking at mono_arch_emit_prolog in mini-amd64.c it seems like Mono also assumes an aligned stack on entry to the functions it generates, meaning we'd get a crash eventually anyway.
 
            On 01/22/2016 11:45 PM, Vincent Povirk wrote:
That's not quite true for __vectorcall convention. From your first link for XMM4,YMM4 for instance: "Must be preserved as needed by caller; *fifth vector-type argument when __vectorcall is used*". So to support all the possible cases we need to preserve all of them I suppose. That's why I added this mess with AVX detection and 2nd thunk version.
OK, I missed that.
But I don't think it's possible for a .NET method to use __vectorcall as its calling convention. The attribute used to specify it doesn't have a value for this: https://msdn.microsoft.com/en-us/library/system.runtime.interopservices.unma...
It's possible MS will extend this in the future, but it's also possible they'll make up entirely new calling conventions. I don't think they will, and for now I don't think it's worth the extra complexity.
Is it the same for managed C++? If you say __vectorcall support is not needed I will of course remove this code.
The only potential problem I see with __attribute__((force_align_arg_pointer)) is that we may want to use compilers other than GCC (such as clang or msvc), and they may not support this attribute.
If they do not support it they should just issue compiler warning, or am I missing something? Clang may hopefully add the suport. msvc is very unlikely to compile Wine soon I guess. Intel Compiler does not support ms_abi attribute at all and thus compiling wine 64bit with it is totally not feasible for now.
How likely do you think it is that there's code out there that calls a .NET method with wrong alignment? Maybe we don't need to worry about this.
I can guess it might be just nearly as likely as without .NET. See this bug: https://bugs.winehq.org/show_bug.cgi?id=27680: there is 7-8 apps w/o .NET known so far. As I got from brief googling native .Net should not suffer from stack misalignment. So I thought it won't hurt to add an attribute to ReallyFixupVTable (which presumably will be universally supported some day) , but considered too much forcing alignment right in the thunk.
I only took a quick look and could be wrong, but from looking at mono_arch_emit_prolog in mini-amd64.c it seems like Mono also assumes an aligned stack on entry to the functions it generates, meaning we'd get a crash eventually anyway.
Are you sure they are going to crash? I looked at the code produced by 'mono --aot' on my short class, they are not storing xmm regs routinely to stack at prolog. Our ms_abi functions do that just because they call non-msabi functions and this is not the case for mono which has ms-abi through all the code, they do not need to do that unless the function modifies non-volatile XMM registers. Even if they do store on stack in this case, they could do it through unaligned instructions or fix stack alignment in such function.
 
            But I don't think it's possible for a .NET method to use __vectorcall as its calling convention. The attribute used to specify it doesn't have a value for this: https://msdn.microsoft.com/en-us/library/system.runtime.interopservices.unma...
It's possible MS will extend this in the future, but it's also possible they'll make up entirely new calling conventions. I don't think they will, and for now I don't think it's worth the extra complexity.
Is it the same for managed C++? If you say __vectorcall support is not needed I will of course remove this code.
Managed C++ is limited to what the runtime can generate for vtable fixups, and I can't find any indication that the runtime can generate __vectorcall functions. So I don't think it is needed.
If they do not support it they should just issue compiler warning, or am I missing something? Clang may hopefully add the suport. msvc is very unlikely to compile Wine soon I guess. Intel Compiler does not support ms_abi attribute at all and thus compiling wine 64bit with it is totally not feasible for now.
I don't know what compilers will be used in practice (some people build parts of wine with msvc, but probably not mscoree). I think we want GCC extensions to at least be inside a check for whether the compiler is GCC.
I can guess it might be just nearly as likely as without .NET. See this bug: https://bugs.winehq.org/show_bug.cgi?id=27680: there is 7-8 apps w/o .NET known so far. As I got from brief googling native .Net should not suffer from stack misalignment. So I thought it won't hurt to add an attribute to ReallyFixupVTable (which presumably will be universally supported some day) , but considered too much forcing alignment right in the thunk.
I think it's fine if we check for gcc (and I think clang may conveniently claim to be gcc), but I also don't think it's required at this point.
Are you sure they are going to crash?
I'm not sure, I did not look carefully. But they do seem to put effort into preserving alignment, in a way that assumes the stack is aligned on entry.
 
            So I am resending the patch with __vectorcall and forced alignment attr removed.

