I was tuning some of my code looking by at the disassembly and I noticed a disturbing thing -- my NtReleaseSemaphore had a really ugly epilogue and prologue!
000000007bc9d150 <NtReleaseSemaphore> push %rdi 000000007bc9d151 <NtReleaseSemaphore+0x1> push %rsi 000000007bc9d152 <NtReleaseSemaphore+0x2> mov %rcx,%r9 000000007bc9d155 <NtReleaseSemaphore+0x5> push %rbx 000000007bc9d156 <NtReleaseSemaphore+0x6> mov $0x8,%ecx 000000007bc9d15b <NtReleaseSemaphore+0xb> mov %r8,%rbx 000000007bc9d15e <NtReleaseSemaphore+0xe> sub $0x150,%rsp 000000007bc9d165 <NtReleaseSemaphore+0x15> mov %fs:0x28,%rax 000000007bc9d16e <NtReleaseSemaphore+0x1e> mov %rax,0xa8(%rsp) 000000007bc9d176 <NtReleaseSemaphore+0x26> xor %eax,%eax 000000007bc9d178 <NtReleaseSemaphore+0x28> mov %rsp,%rdi 000000007bc9d17b <NtReleaseSemaphore+0x2b> movl $0x0,0x40(%rsp) 000000007bc9d183 <NtReleaseSemaphore+0x33> movaps %xmm6,0xb0(%rsp) 000000007bc9d18b <NtReleaseSemaphore+0x3b> movaps %xmm7,0xc0(%rsp) 000000007bc9d193 <NtReleaseSemaphore+0x43> movaps %xmm8,0xd0(%rsp) 000000007bc9d19c <NtReleaseSemaphore+0x4c> movaps %xmm9,0xe0(%rsp) 000000007bc9d1a5 <NtReleaseSemaphore+0x55> movaps %xmm10,0xf0(%rsp) 000000007bc9d1ae <NtReleaseSemaphore+0x5e> movaps %xmm11,0x100(%rsp) 000000007bc9d1b7 <NtReleaseSemaphore+0x67> movaps %xmm12,0x110(%rsp) 000000007bc9d1c0 <NtReleaseSemaphore+0x70> movaps %xmm13,0x120(%rsp) 000000007bc9d1c9 <NtReleaseSemaphore+0x79> movaps %xmm14,0x130(%rsp) 000000007bc9d1d2 <NtReleaseSemaphore+0x82> movaps %xmm15,0x140(%rsp)
(55 bytes of actual work)
000000007bc9d227 <NtReleaseSemaphore+0xd7> movaps 0xb0(%rsp),%xmm6 000000007bc9d22f <NtReleaseSemaphore+0xdf> movaps 0xc0(%rsp),%xmm7 000000007bc9d237 <NtReleaseSemaphore+0xe7> movaps 0xd0(%rsp),%xmm8 000000007bc9d240 <NtReleaseSemaphore+0xf0> movaps 0xe0(%rsp),%xmm9 000000007bc9d249 <NtReleaseSemaphore+0xf9> movaps 0xf0(%rsp),%xmm10 000000007bc9d252 <NtReleaseSemaphore+0x102> movaps 0x100(%rsp),%xmm11 000000007bc9d25b <NtReleaseSemaphore+0x10b> movaps 0x110(%rsp),%xmm12 000000007bc9d264 <NtReleaseSemaphore+0x114> movaps 0x120(%rsp),%xmm13 000000007bc9d26d <NtReleaseSemaphore+0x11d> movaps 0x130(%rsp),%xmm14 000000007bc9d276 <NtReleaseSemaphore+0x126> movaps 0x140(%rsp),%xmm15 000000007bc9d27f <NtReleaseSemaphore+0x12f> add $0x150,%rsp 000000007bc9d286 <NtReleaseSemaphore+0x136> pop %rbx 000000007bc9d287 <NtReleaseSemaphore+0x137> pop %rsi 000000007bc9d288 <NtReleaseSemaphore+0x138> pop %rdi 000000007bc9d289 <NtReleaseSemaphore+0x139> retq
So I looked at the rest of the file and all of the WINAPI functions were thus. So I looked at the rest of the build and the same thing, all WINAPI (i.e., __attribute__((ms_abi))) functions had these horribly bloated epilogues and prologues. I built ntdll again with -Dms_abi=sysv_abi and the problem went away (but of course wouldn't work).
Surely this cannot be an inherent requirement of ms_abi!? I'm hoping that this is just some bug that has slipped through the cracks because not many people use ms_abi on Linux.
Daniel
So just to follow this up, I have learned that it is indeed a requirement of the ms_abi when calling a function of sysv_abi, that is allowed to destroy those registers. Maybe someday we'll come up with a safe way to avoid this overhead.
Daniel