Paul Gofman (@gofman) commented about dlls/ntdll/unix/signal_x86_64.c:
"movq %r10,0xa0(%rsp)\n\t" /* frame->prev_frame */ "movq %rsp,0x378(%r13)\n\t" /* thread_data->syscall_frame */ "testl $1,0x380(%r13)\n\t" /* thread_data->syscall_trace */
"movq %r10,%r14\n\t" /* user_frame (previous thread_data->syscall_frame) */
You can probably use r14 instead of r10 at once and thus avoid adding extra transfer (performance for one extra mov doesn't matter but longer and less obvious code does).