On Sun, Oct 11, 2015 at 04:09:59PM +0900, Alexandre Julliard wrote:
Marcus Meissner marcus@jet.franken.de writes:
Hi,
(as small heads up) As we have this problem with the Win64 code calling us with non-16 byte aligned stacks, the gcc folks have now commited code to trunk gcc that allows -mincoming-stack-boundary=3 on October 7th.
What we really need is force_align_arg_pointer. The bug says that this is fixed too, have you verified it?
While I do not have such a test function here, I used the one from the gcc testcase:
typedef float v4sf __attribute__((vector_size(16)));
__attribute__((force_align_arg_pointer)) v4sf test (v4sf a, v4sf b) { volatile v4sf z = a + b; return z; }
without attribute and -O2:
00000000000000c0 <test>: c0: 0f 58 c8 addps %xmm0,%xmm1 c3: 0f 29 4c 24 e8 movaps %xmm1,-0x18(%rsp) c8: 0f 28 44 24 e8 movaps -0x18(%rsp),%xmm0 cd: c3 retq
with attribute and -O2 :
00000000000000c0 <test>: c0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 c5: 0f 58 c8 addps %xmm0,%xmm1 c8: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp cc: 41 ff 72 f8 pushq -0x8(%r10) d0: 55 push %rbp d1: 48 89 e5 mov %rsp,%rbp d4: 41 52 push %r10 d6: 0f 29 4d e0 movaps %xmm1,-0x20(%rbp) da: 0f 28 45 e0 movaps -0x20(%rbp),%xmm0 de: 41 5a pop %r10 e0: 5d pop %rbp e1: 49 8d 62 f8 lea -0x8(%r10),%rsp e5: c3 retq
Ciao, Marcus