Hi,
(as small heads up) As we have this problem with the Win64 code calling us with non-16 byte aligned stacks, the gcc folks have now commited code to trunk gcc that allows -mincoming-stack-boundary=3 on October 7th.
Uros also plans a backport for gcc 5.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66697
$ cat xx.c #include <stdio.h> int f(int a,int b) { printf("a=%d, b=%d\n", a, b); }
-O2 build: 0000000000000000 <f>: 0: 89 f2 mov %esi,%edx 2: 31 c0 xor %eax,%eax 4: 89 fe mov %edi,%esi 6: bf 00 00 00 00 mov $0x0,%edi b: e9 00 00 00 00 jmpq 10 <f+0x10>
-O2 -mincoming-stack-boundary=3 0000000000000000 <f>: 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 9: 89 f2 mov %esi,%edx b: 31 c0 xor %eax,%eax d: 89 fe mov %edi,%esi f: bf 00 00 00 00 mov $0x0,%edi 14: 41 ff 72 f8 pushq -0x8(%r10) 18: 55 push %rbp 19: 48 89 e5 mov %rsp,%rbp 1c: 41 52 push %r10 1e: 48 83 ec 08 sub $0x8,%rsp 22: e8 00 00 00 00 callq 27 <f+0x27> 27: 48 83 c4 08 add $0x8,%rsp 2b: 41 5a pop %r10 2d: 5d pop %rbp 2e: 49 8d 62 f8 lea -0x8(%r10),%rsp 32: c3 retq
It is more code emitted, but we can at least fix it without generating our own thunks.
Ciao, Marcus
Marcus Meissner marcus@jet.franken.de writes:
Hi,
(as small heads up) As we have this problem with the Win64 code calling us with non-16 byte aligned stacks, the gcc folks have now commited code to trunk gcc that allows -mincoming-stack-boundary=3 on October 7th.
What we really need is force_align_arg_pointer. The bug says that this is fixed too, have you verified it?
On Sun, Oct 11, 2015 at 04:09:59PM +0900, Alexandre Julliard wrote:
Marcus Meissner marcus@jet.franken.de writes:
Hi,
(as small heads up) As we have this problem with the Win64 code calling us with non-16 byte aligned stacks, the gcc folks have now commited code to trunk gcc that allows -mincoming-stack-boundary=3 on October 7th.
What we really need is force_align_arg_pointer. The bug says that this is fixed too, have you verified it?
While I do not have such a test function here, I used the one from the gcc testcase:
typedef float v4sf __attribute__((vector_size(16)));
__attribute__((force_align_arg_pointer)) v4sf test (v4sf a, v4sf b) { volatile v4sf z = a + b; return z; }
without attribute and -O2:
00000000000000c0 <test>: c0: 0f 58 c8 addps %xmm0,%xmm1 c3: 0f 29 4c 24 e8 movaps %xmm1,-0x18(%rsp) c8: 0f 28 44 24 e8 movaps -0x18(%rsp),%xmm0 cd: c3 retq
with attribute and -O2 :
00000000000000c0 <test>: c0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 c5: 0f 58 c8 addps %xmm0,%xmm1 c8: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp cc: 41 ff 72 f8 pushq -0x8(%r10) d0: 55 push %rbp d1: 48 89 e5 mov %rsp,%rbp d4: 41 52 push %r10 d6: 0f 29 4d e0 movaps %xmm1,-0x20(%rbp) da: 0f 28 45 e0 movaps -0x20(%rbp),%xmm0 de: 41 5a pop %r10 e0: 5d pop %rbp e1: 49 8d 62 f8 lea -0x8(%r10),%rsp e5: c3 retq
Ciao, Marcus