Signed-off-by: Paul Gofman pgofman@codeweavers.com --- Supersedes 220540.
Changes: - leave r11 alone for now; - also test NtSetContextThread with CONTEXT_INTEGER; - don't assume same stack pointer and non-volatile registers between test function calls in the test.
dlls/ntdll/tests/exception.c | 76 +++++++++++++++++++++++++++++++++ dlls/ntdll/unix/signal_x86_64.c | 11 +++-- 2 files changed, 83 insertions(+), 4 deletions(-)
diff --git a/dlls/ntdll/tests/exception.c b/dlls/ntdll/tests/exception.c index 0362cb9f0a9..3019c1a40cb 100644 --- a/dlls/ntdll/tests/exception.c +++ b/dlls/ntdll/tests/exception.c @@ -4917,6 +4917,81 @@ static void test_unwind_from_apc(void) ok(pass == 4, "Got unexpected pass %d.\n", pass); ok(test_unwind_apc_called, "Test user APC was not called.\n"); } + +static void test_syscall_clobbered_regs(void) +{ + struct regs + { + UINT64 rcx; + }; + static const BYTE code[] = + { + 0x48, 0x8d, 0x05, 0x00, 0x10, 0x00, 0x00, + /* leaq 0x1000(%rip),%rax */ + 0x48, 0x25, 0x00, 0xf0, 0xff, 0xff, + /* andq $~0xfff,%rax */ + 0x48, 0x83, 0xe8, 0x08, /* subq $8,%rax */ + 0x48, 0x89, 0x20, /* movq %rsp,0(%rax) */ + 0x48, 0x89, 0xc4, /* movq %rax,%rsp */ + 0x41, 0x50, /* push %r8 */ + 0x53, 0x55, 0x57, 0x56, 0x41, 0x54, 0x41, 0x55, 0x41, 0x56, 0x41, 0x57, + /* push %rbx, %rbp, %rdi, %rsi, %r12, %r13, %r14, %r15 */ + 0x41, 0xff, 0xd1, /* callq *r9 */ + 0x41, 0x5f, 0x41, 0x5e, 0x41, 0x5d, 0x41, 0x5c, 0x5e, 0x5f, 0x5d, 0x5b, + /* pop %r15, %r14, %r13, %r12, %rsi, %rdi, %rbp, %rbx */ + 0x41, 0x58, /* pop %r8 */ + 0x49, 0x89, 0x48, 0x00, /* mov %rcx,(%r8) */ + 0x5c, /* pop %rsp */ + 0xc3, /* ret */ + }; + + NTSTATUS (WINAPI *func)(void *arg1, void *arg2, struct regs *, void *call_addr); + NTSTATUS (WINAPI *pNtCancelTimer)(HANDLE, BOOLEAN *); + HMODULE hntdll = GetModuleHandleA("ntdll.dll"); + struct regs regs; + CONTEXT context; + NTSTATUS status; + + pNtCancelTimer = (void *)GetProcAddress(hntdll, "NtCancelTimer"); + ok(!!pNtCancelTimer, "NtCancelTimer not found.\n"); + memcpy(code_mem, code, sizeof(code)); + func = code_mem; + memset(®s, 0, sizeof(regs)); + status = func((HANDLE)0xdeadbeef, NULL, ®s, pNtCancelTimer); + ok(status == STATUS_INVALID_HANDLE, "Got unexpected status %#x.\n", status); + + /* After the syscall instruction rcx contains the address of the instruction next after syscall. */ + ok((BYTE *)regs.rcx > (BYTE *)pNtCancelTimer && (BYTE *)regs.rcx < (BYTE *)pNtCancelTimer + 0x20, + "Got unexpected rcx %s, pNtCancelTimer %p.\n", wine_dbgstr_longlong(regs.rcx), pNtCancelTimer); + + status = func((HANDLE)0xdeadbeef, (BOOLEAN *)0xdeadbeef, ®s, pNtCancelTimer); + ok(status == STATUS_ACCESS_VIOLATION, "Got unexpected status %#x.\n", status); + ok((BYTE *)regs.rcx > (BYTE *)pNtCancelTimer && (BYTE *)regs.rcx < (BYTE *)pNtCancelTimer + 0x20, + "Got unexpected rcx %s, pNtCancelTimer %p.\n", wine_dbgstr_longlong(regs.rcx), pNtCancelTimer); + + context.ContextFlags = CONTEXT_CONTROL; + status = func(GetCurrentThread(), &context, ®s, pNtGetContextThread); + ok(status == STATUS_SUCCESS, "Got unexpected status %#x.\n", status); + ok((BYTE *)regs.rcx > (BYTE *)pNtGetContextThread && (BYTE *)regs.rcx < (BYTE *)pNtGetContextThread + 0x20, + "Got unexpected rcx %s, pNtGetContextThread %p.\n", wine_dbgstr_longlong(regs.rcx), pNtGetContextThread); + + status = func(GetCurrentThread(), &context, ®s, pNtSetContextThread); + ok(status == STATUS_SUCCESS, "Got unexpected status %#x.\n", status); + ok((BYTE *)regs.rcx > (BYTE *)pNtGetContextThread && (BYTE *)regs.rcx < (BYTE *)pNtGetContextThread + 0x20, + "Got unexpected rcx %s, pNtGetContextThread %p.\n", wine_dbgstr_longlong(regs.rcx), pNtGetContextThread); + + context.ContextFlags = CONTEXT_INTEGER; + status = func(GetCurrentThread(), &context, ®s, pNtGetContextThread); + ok(status == STATUS_SUCCESS, "Got unexpected status %#x.\n", status); + ok((BYTE *)regs.rcx > (BYTE *)pNtGetContextThread && (BYTE *)regs.rcx < (BYTE *)pNtGetContextThread + 0x20, + "Got unexpected rcx %s, pNtGetContextThread %p.\n", wine_dbgstr_longlong(regs.rcx), pNtGetContextThread); + + status = func(GetCurrentThread(), &context, ®s, pNtSetContextThread); + ok(status == STATUS_SUCCESS, "Got unexpected status %#x.\n", status); + ok((BYTE *)regs.rcx > (BYTE *)pNtSetContextThread && (BYTE *)regs.rcx < (BYTE *)pNtSetContextThread + 0x20, + "Got unexpected rcx %s, pNtSetContextThread %p.\n", wine_dbgstr_longlong(regs.rcx), pNtSetContextThread); + +} #elif defined(__arm__)
#define UNW_FLAG_NHANDLER 0 @@ -10737,6 +10812,7 @@ START_TEST(exception) test_extended_context(); test_copy_context(); test_unwind_from_apc(); + test_syscall_clobbered_regs();
#elif defined(__aarch64__)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 4310c1c6a2f..69625af1e24 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -3236,18 +3236,21 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "testl $0x3,%edx\n\t" /* CONTEXT_CONTROL | CONTEXT_INTEGER */ "jnz 1f\n\t" "movq 0x88(%rcx),%rsp\n\t" - "jmpq *0x70(%rcx)\n" /* frame->rip */ + "movq 0x70(%rcx),%rcx\n\t" /* frame->rip */ + "jmpq *%rcx\n\t" "1:\tleaq 0x70(%rcx),%rsp\n\t" "testl $0x2,%edx\n\t" /* CONTEXT_INTEGER */ - "jz 1f\n\t" - "movq 0x00(%rcx),%rax\n\t" + "jnz 1f\n\t" + "movq (%rsp),%rcx\n\t" /* frame->rip */ + "iretq\n" + "1:\tmovq 0x00(%rcx),%rax\n\t" "movq 0x18(%rcx),%rdx\n\t" "movq 0x30(%rcx),%r8\n\t" "movq 0x38(%rcx),%r9\n\t" "movq 0x40(%rcx),%r10\n\t" "movq 0x48(%rcx),%r11\n\t" "movq 0x10(%rcx),%rcx\n" - "1:\tiretq\n" + "iretq\n" "5:\tmovl $0xc000000d,%edx\n\t" /* STATUS_INVALID_PARAMETER */ "movq %rsp,%rcx\n" __ASM_NAME("__wine_syscall_dispatcher_return") ":\n\t"
Thanks a lot for taking time to read and consider my feedback! Overall the patch looks good. I saw a few things that you might have missed. Many of them are nitpicks, so feel free to ignore them if you wish.
On 12/1/21 21:38, Paul Gofman wrote:
Signed-off-by: Paul Gofman pgofman@codeweavers.com
Supersedes 220540. Changes: - leave r11 alone for now; - also test NtSetContextThread with CONTEXT_INTEGER; - don't assume same stack pointer and non-volatile registers between test function calls in the test.
dlls/ntdll/tests/exception.c | 76 +++++++++++++++++++++++++++++++++ dlls/ntdll/unix/signal_x86_64.c | 11 +++-- 2 files changed, 83 insertions(+), 4 deletions(-)
diff --git a/dlls/ntdll/tests/exception.c b/dlls/ntdll/tests/exception.c index 0362cb9f0a9..3019c1a40cb 100644 --- a/dlls/ntdll/tests/exception.c +++ b/dlls/ntdll/tests/exception.c @@ -4917,6 +4917,81 @@ static void test_unwind_from_apc(void) ok(pass == 4, "Got unexpected pass %d.\n", pass); ok(test_unwind_apc_called, "Test user APC was not called.\n"); }
+static void test_syscall_clobbered_regs(void) +{
- struct regs
- {
UINT64 rcx;
- };
- static const BYTE code[] =
- {
0x48, 0x8d, 0x05, 0x00, 0x10, 0x00, 0x00,
/* leaq 0x1000(%rip),%rax */
0x48, 0x25, 0x00, 0xf0, 0xff, 0xff,
/* andq $~0xfff,%rax */
code_mem is already page-aligned, so &~0xfff should be unnecessary. Even if this wasn't case, it might lead to the stack overlapping the code if e.g. (rax & 0xfff) == 0xfff.
0x48, 0x83, 0xe8, 0x08, /* subq $8,%rax */
0x48, 0x89, 0x20, /* movq %rsp,0(%rax) */
0x48, 0x89, 0xc4, /* movq %rax,%rsp */
I think "push %rsp" would make it clear that we're saving RSP. It's well known that PUSH [ER]SP computes the source operand *before* decrementing the stack pointer. Even if it's not immediately obvious to someone who reads the code, they can still deduce from "POP RSP" at the epilogue that it's saving/restoring the stack pointer.
All in all, my suggestion goes:
0x48, 0x8d, 0x25, 0xf9, 0x0f, 0x00, 0x00, /* leaq 0xff9(%rip),%rsp */ 0x54, /* push %rsp */
0x41, 0x50, /* push %r8 */
0x53, 0x55, 0x57, 0x56, 0x41, 0x54, 0x41, 0x55, 0x41, 0x56, 0x41, 0x57,
/* push %rbx, %rbp, %rdi, %rsi, %r12, %r13, %r14, %r15 */
0x41, 0xff, 0xd1, /* callq *r9 */
0x41, 0x5f, 0x41, 0x5e, 0x41, 0x5d, 0x41, 0x5c, 0x5e, 0x5f, 0x5d, 0x5b,
/* pop %r15, %r14, %r13, %r12, %rsi, %rdi, %rbp, %rbx */
I think we should reload RSP beforehand, since we're assuming the stack pointer isn't reliable at all after the system call.
0x41, 0x58, /* pop %r8 */
0x49, 0x89, 0x48, 0x00, /* mov %rcx,(%r8) */
0x5c, /* pop %rsp */
(maybe swap the above two to keep the symmetry? I don't think it matters much, though.)
0xc3, /* ret */
- };
- NTSTATUS (WINAPI *func)(void *arg1, void *arg2, struct regs *, void *call_addr);
- NTSTATUS (WINAPI *pNtCancelTimer)(HANDLE, BOOLEAN *);
- HMODULE hntdll = GetModuleHandleA("ntdll.dll");
- struct regs regs;
- CONTEXT context;
- NTSTATUS status;
- pNtCancelTimer = (void *)GetProcAddress(hntdll, "NtCancelTimer");
- ok(!!pNtCancelTimer, "NtCancelTimer not found.\n");
- memcpy(code_mem, code, sizeof(code));
- func = code_mem;
- memset(®s, 0, sizeof(regs));
- status = func((HANDLE)0xdeadbeef, NULL, ®s, pNtCancelTimer);
- ok(status == STATUS_INVALID_HANDLE, "Got unexpected status %#x.\n", status);
- /* After the syscall instruction rcx contains the address of the instruction next after syscall. */
- ok((BYTE *)regs.rcx > (BYTE *)pNtCancelTimer && (BYTE *)regs.rcx < (BYTE *)pNtCancelTimer + 0x20,
"Got unexpected rcx %s, pNtCancelTimer %p.\n", wine_dbgstr_longlong(regs.rcx), pNtCancelTimer);
- status = func((HANDLE)0xdeadbeef, (BOOLEAN *)0xdeadbeef, ®s, pNtCancelTimer);
- ok(status == STATUS_ACCESS_VIOLATION, "Got unexpected status %#x.\n", status);
- ok((BYTE *)regs.rcx > (BYTE *)pNtCancelTimer && (BYTE *)regs.rcx < (BYTE *)pNtCancelTimer + 0x20,
"Got unexpected rcx %s, pNtCancelTimer %p.\n", wine_dbgstr_longlong(regs.rcx), pNtCancelTimer);
- context.ContextFlags = CONTEXT_CONTROL;
- status = func(GetCurrentThread(), &context, ®s, pNtGetContextThread);
- ok(status == STATUS_SUCCESS, "Got unexpected status %#x.\n", status);
- ok((BYTE *)regs.rcx > (BYTE *)pNtGetContextThread && (BYTE *)regs.rcx < (BYTE *)pNtGetContextThread + 0x20,
"Got unexpected rcx %s, pNtGetContextThread %p.\n", wine_dbgstr_longlong(regs.rcx), pNtGetContextThread);
- status = func(GetCurrentThread(), &context, ®s, pNtSetContextThread);
- ok(status == STATUS_SUCCESS, "Got unexpected status %#x.\n", status);
- ok((BYTE *)regs.rcx > (BYTE *)pNtGetContextThread && (BYTE *)regs.rcx < (BYTE *)pNtGetContextThread + 0x20,
"Got unexpected rcx %s, pNtGetContextThread %p.\n", wine_dbgstr_longlong(regs.rcx), pNtGetContextThread);
- context.ContextFlags = CONTEXT_INTEGER;
- status = func(GetCurrentThread(), &context, ®s, pNtGetContextThread);
- ok(status == STATUS_SUCCESS, "Got unexpected status %#x.\n", status);
- ok((BYTE *)regs.rcx > (BYTE *)pNtGetContextThread && (BYTE *)regs.rcx < (BYTE *)pNtGetContextThread + 0x20,
"Got unexpected rcx %s, pNtGetContextThread %p.\n", wine_dbgstr_longlong(regs.rcx), pNtGetContextThread);
- status = func(GetCurrentThread(), &context, ®s, pNtSetContextThread);
- ok(status == STATUS_SUCCESS, "Got unexpected status %#x.\n", status);
- ok((BYTE *)regs.rcx > (BYTE *)pNtSetContextThread && (BYTE *)regs.rcx < (BYTE *)pNtSetContextThread + 0x20,
"Got unexpected rcx %s, pNtSetContextThread %p.\n", wine_dbgstr_longlong(regs.rcx), pNtSetContextThread);
+} #elif defined(__arm__)
#define UNW_FLAG_NHANDLER 0 @@ -10737,6 +10812,7 @@ START_TEST(exception) test_extended_context(); test_copy_context(); test_unwind_from_apc();
- test_syscall_clobbered_regs();
#elif defined(__aarch64__)
diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index 4310c1c6a2f..69625af1e24 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -3236,18 +3236,21 @@ __ASM_GLOBAL_FUNC( __wine_syscall_dispatcher, "testl $0x3,%edx\n\t" /* CONTEXT_CONTROL | CONTEXT_INTEGER */ "jnz 1f\n\t" "movq 0x88(%rcx),%rsp\n\t"
"jmpq *0x70(%rcx)\n" /* frame->rip */
"movq 0x70(%rcx),%rcx\n\t" /* frame->rip */
"jmpq *%rcx\n\t" "1:\tleaq 0x70(%rcx),%rsp\n\t" "testl $0x2,%edx\n\t" /* CONTEXT_INTEGER */
"jz 1f\n\t"
"movq 0x00(%rcx),%rax\n\t"
"jnz 1f\n\t"
"movq (%rsp),%rcx\n\t" /* frame->rip */
CONTEXT_CONTROL means we should restore the CS segment register as well. Since SYSRET cannot restore CS, Windows would have to use plain IRETQ instead of SYSRET. In this case I suspect there should be no reason to clobber RCX at all.
Note that this claim is unconfirmed; I might need some testing.
"iretq\n"
"1:\tmovq 0x00(%rcx),%rax\n\t" "movq 0x18(%rcx),%rdx\n\t" "movq 0x30(%rcx),%r8\n\t" "movq 0x38(%rcx),%r9\n\t" "movq 0x40(%rcx),%r10\n\t" "movq 0x48(%rcx),%r11\n\t" "movq 0x10(%rcx),%rcx\n"
"1:\tiretq\n"
"iretq\n" "5:\tmovl $0xc000000d,%edx\n\t" /* STATUS_INVALID_PARAMETER */ "movq %rsp,%rcx\n" __ASM_NAME("__wine_syscall_dispatcher_return") ":\n\t"
On 12/1/21 19:15, Jinoh Kang wrote:
+static void test_syscall_clobbered_regs(void) +{
- struct regs
- {
UINT64 rcx;
- };
- static const BYTE code[] =
- {
0x48, 0x8d, 0x05, 0x00, 0x10, 0x00, 0x00,
/* leaq 0x1000(%rip),%rax */
0x48, 0x25, 0x00, 0xf0, 0xff, 0xff,
/* andq $~0xfff,%rax */
code_mem is already page-aligned, so &~0xfff should be unnecessary. Even if this wasn't case, it might lead to the stack overlapping the code if e.g. (rax & 0xfff) == 0xfff.
But I am adding 0x1000 to rip beforehand, how would it overlap the code?
0x48, 0x83, 0xe8, 0x08, /* subq $8,%rax */
0x48, 0x89, 0x20, /* movq %rsp,0(%rax) */
0x48, 0x89, 0xc4, /* movq %rax,%rsp */
I think "push %rsp" would make it clear that we're saving RSP.
But it will push that on the current stack while I want to save it to the new temporary stack, before clobbering rsp. I could first push the rsp on the initial stack and then copy but it is not obvious to me that these extra operations are an improvement.
All in all, my suggestion goes:
0x48, 0x8d, 0x25, 0xf9, 0x0f, 0x00, 0x00, /* leaq 0xff9(%rip),%rsp */ 0x54, /* push %rsp */
0x41, 0x50, /* push %r8 */
0x53, 0x55, 0x57, 0x56, 0x41, 0x54, 0x41, 0x55, 0x41, 0x56, 0x41, 0x57,
/* push %rbx, %rbp, %rdi, %rsi, %r12, %r13, %r14, %r15 */
0x41, 0xff, 0xd1, /* callq *r9 */
0x41, 0x5f, 0x41, 0x5e, 0x41, 0x5d, 0x41, 0x5c, 0x5e, 0x5f, 0x5d, 0x5b,
/* pop %r15, %r14, %r13, %r12, %rsi, %rdi, %rbp, %rbx */
I think we should reload RSP beforehand, since we're assuming the stack pointer isn't reliable at all after the system call.
The stack pointer is reliable after the system call, it is always (((code_mem + 0x1000) ~0xfff) - saved_regs_size). This is not a universal procedure, only a part of specific test case and frankly I don't find the motivation to make it universal at the cost of complication. The reason I introduced this at all in the second patch version is that previously the test implicitly relied on the compiler calling the asm block with the same %rsp. Now it doesn't, the %rsp is set to the same known value before NtGetContextThread().
"testl $0x3,%edx\n\t" /* CONTEXT_CONTROL | CONTEXT_INTEGER */ "jnz 1f\n\t" "movq 0x88(%rcx),%rsp\n\t"
"jmpq *0x70(%rcx)\n" /* frame->rip */
"movq 0x70(%rcx),%rcx\n\t" /* frame->rip */
"jmpq *%rcx\n\t" "1:\tleaq 0x70(%rcx),%rsp\n\t" "testl $0x2,%edx\n\t" /* CONTEXT_INTEGER */
"jz 1f\n\t"
"movq 0x00(%rcx),%rax\n\t"
"jnz 1f\n\t"
"movq (%rsp),%rcx\n\t" /* frame->rip */
CONTEXT_CONTROL means we should restore the CS segment register as well. Since SYSRET cannot restore CS, Windows would have to use plain IRETQ instead of SYSRET. In this case I suspect there should be no reason to clobber RCX at all.
Note that this claim is unconfirmed; I might need some testing.
Does my patch change anything in this regard? I hope it doesn't, if it does that is an oversight. If there is something looking wrong with how it works now fixing that should not go in the same patch. But I think we currently restore CS (both before and after my patch) by using iretq in case of CONTEXT_CONTROL, isn't that the case?
CONTEXT_CONTROL
On 12/1/21 19:29, Paul Gofman wrote:
"testl $0x3,%edx\n\t" /* CONTEXT_CONTROL | CONTEXT_INTEGER */ "jnz 1f\n\t" "movq 0x88(%rcx),%rsp\n\t" - "jmpq *0x70(%rcx)\n" /* frame->rip */ + "movq 0x70(%rcx),%rcx\n\t" /* frame->rip */ + "jmpq *%rcx\n\t" "1:\tleaq 0x70(%rcx),%rsp\n\t" "testl $0x2,%edx\n\t" /* CONTEXT_INTEGER */ - "jz 1f\n\t" - "movq 0x00(%rcx),%rax\n\t" + "jnz 1f\n\t" + "movq (%rsp),%rcx\n\t" /* frame->rip */ CONTEXT_CONTROL means we should restore the CS segment register as well. Since SYSRET cannot restore CS, Windows would have to use plain IRETQ instead of SYSRET. In this case I suspect there should be no reason to clobber RCX at all.
Note that this claim is unconfirmed; I might need some testing.
Does my patch change anything in this regard? I hope it doesn't, if it does that is an oversight. If there is something looking wrong with how it works now fixing that should not go in the same patch. But I think we currently restore CS (both before and after my patch) by using iretq in case of CONTEXT_CONTROL, isn't that the case?
If you mean that rcx should not be set this way in case of CONTEXT_CONTROL, I think my test with NtSetContextThread plainly shows that it should?
On 12/2/21 01:29, Paul Gofman wrote:
On 12/1/21 19:15, Jinoh Kang wrote:
+static void test_syscall_clobbered_regs(void) +{ + struct regs + { + UINT64 rcx; + }; + static const BYTE code[] = + { + 0x48, 0x8d, 0x05, 0x00, 0x10, 0x00, 0x00, + /* leaq 0x1000(%rip),%rax */ + 0x48, 0x25, 0x00, 0xf0, 0xff, 0xff, + /* andq $~0xfff,%rax */
code_mem is already page-aligned, so &~0xfff should be unnecessary. Even if this wasn't case, it might lead to the stack overlapping the code if e.g. (rax & 0xfff) == 0xfff.
But I am adding 0x1000 to rip beforehand, how would it overlap the code?
Example scenario, if code_mem == 0x123FF0: - LEAQ RAX, [RIP + 0x1000] | RAX = 0x124FF7 (0x123FF0 + 7 + 0x1000) - ANDQ RAX, ~0xFFF | RAX = 0x124000 (0x124FF7 & ~0xFFF) - SUBQ RAX, 8 | RAX = 0x123FF8 (0x124FF0 - 8) - MOVQ [RAX], RSP - MOVQ RSP, RAX | RSP = 0x123FF8 - PUSH R8 | RSP = 0x123FF0 - PUSH RBX | RSP = 0x123FE8 <--- [[overflow]]
code_mem is always page-aligned though, so the overflow won't happen. But in that case I suppose andq gives a false impression that the address might not be page-aligned.
+ 0x48, 0x83, 0xe8, 0x08, /* subq $8,%rax */ + 0x48, 0x89, 0x20, /* movq %rsp,0(%rax) */ + 0x48, 0x89, 0xc4, /* movq %rax,%rsp */
I think "push %rsp" would make it clear that we're saving RSP.
But it will push that on the current stack while I want to save it to the new temporary stack, before clobbering rsp. I could first push the rsp on the initial stack and then copy but it is not obvious to me that these extra operations are an improvement.
You're right. My apologies for wasting your time, I need to get some sleep...
All in all, my suggestion goes:
0x48, 0x8d, 0x25, 0xf9, 0x0f, 0x00, 0x00, /* leaq 0xff9(%rip),%rsp */ 0x54, /* push %rsp */ + 0x41, 0x50, /* push %r8 */ + 0x53, 0x55, 0x57, 0x56, 0x41, 0x54, 0x41, 0x55, 0x41, 0x56, 0x41, 0x57, + /* push %rbx, %rbp, %rdi, %rsi, %r12, %r13, %r14, %r15 */ + 0x41, 0xff, 0xd1, /* callq *r9 */ + 0x41, 0x5f, 0x41, 0x5e, 0x41, 0x5d, 0x41, 0x5c, 0x5e, 0x5f, 0x5d, 0x5b, + /* pop %r15, %r14, %r13, %r12, %rsi, %rdi, %rbp, %rbx */
I think we should reload RSP beforehand, since we're assuming the stack pointer isn't reliable at all after the system call.
The stack pointer is reliable after the system call,
Perhaps I misinterpreted when you said
- don't assume same stack pointer and non-volatile registers between test function calls in the test.
In that case, it's totally fine.
it is always (((code_mem + 0x1000) ~0xfff) - saved_regs_size). This is not a universal procedure, only a part of specific test case and frankly I don't find the motivation to make it universal at the cost of complication. The reason I introduced this at all in the second patch version is that previously the test implicitly relied on the compiler calling the asm block with the same %rsp. Now it doesn't, the %rsp is set to the same known value before NtGetContextThread().
"testl $0x3,%edx\n\t" /* CONTEXT_CONTROL | CONTEXT_INTEGER */ "jnz 1f\n\t" "movq 0x88(%rcx),%rsp\n\t" - "jmpq *0x70(%rcx)\n" /* frame->rip */ + "movq 0x70(%rcx),%rcx\n\t" /* frame->rip */ + "jmpq *%rcx\n\t" "1:\tleaq 0x70(%rcx),%rsp\n\t" "testl $0x2,%edx\n\t" /* CONTEXT_INTEGER */ - "jz 1f\n\t" - "movq 0x00(%rcx),%rax\n\t" + "jnz 1f\n\t" + "movq (%rsp),%rcx\n\t" /* frame->rip */ CONTEXT_CONTROL means we should restore the CS segment register as well. Since SYSRET cannot restore CS, Windows would have to use plain IRETQ instead of SYSRET. In this case I suspect there should be no reason to clobber RCX at all.
Note that this claim is unconfirmed; I might need some testing.
Does my patch change anything in this regard? I hope it doesn't, if it does that is an oversight. If there is something looking wrong with how it works now fixing that should not go in the same patch. But I think we currently restore CS (both before and after my patch) by using iretq in case of CONTEXT_CONTROL, isn't that the case?
Apparantly I was confused here, too. Please disregard my previous comment. Your tests show that it's clear I was wrong -- CONTEXT_CONTROL by itself doesn't really disable SYSRET.
Perhaps it ignores CS being set to any other value (e.g. 32-bit compat segment?) or maybe selectively use IRETQ. I hope no usermode anti-debugging/anti-cheat would rely on this...
CONTEXT_CONTROL
On 12/1/21 19:49, Jinoh Kang wrote:
Example scenario, if code_mem == 0x123FF0:
- LEAQ RAX, [RIP + 0x1000] | RAX = 0x124FF7 (0x123FF0 + 7 + 0x1000)
- ANDQ RAX, ~0xFFF | RAX = 0x124000 (0x124FF7 & ~0xFFF)
- SUBQ RAX, 8 | RAX = 0x123FF8 (0x124FF0 - 8)
- MOVQ [RAX], RSP
- MOVQ RSP, RAX | RSP = 0x123FF8
- PUSH R8 | RSP = 0x123FF0
- PUSH RBX | RSP = 0x123FE8 <--- [[overflow]]
code_mem is always page-aligned though, so the overflow won't happen. But in that case I suppose andq gives a false impression that the address might not be page-aligned.
I didn't want to rely that the rip is exactly at the code_mem start and not a few bytes ahead.
Perhaps it ignores CS being set to any other value (e.g. 32-bit compat segment?) or maybe selectively use IRETQ. I hope no usermode anti-debugging/anti-cheat would rely on this...
Some might, in theory, or maybe rely on one of the numerous other neat things around this stuff. I am just not sure we should (or can) mind every bit of special cases we can think of before we know that anything relies on that.
On 12/2/21 01:54, Paul Gofman wrote:
On 12/1/21 19:49, Jinoh Kang wrote:
Example scenario, if code_mem == 0x123FF0:
- LEAQ RAX, [RIP + 0x1000] | RAX = 0x124FF7 (0x123FF0 + 7 + 0x1000)
- ANDQ RAX, ~0xFFF | RAX = 0x124000 (0x124FF7 & ~0xFFF)
- SUBQ RAX, 8 | RAX = 0x123FF8 (0x124FF0 - 8)
- MOVQ [RAX], RSP
- MOVQ RSP, RAX | RSP = 0x123FF8
- PUSH R8 | RSP = 0x123FF0
- PUSH RBX | RSP = 0x123FE8 <--- [[overflow]]
code_mem is always page-aligned though, so the overflow won't happen. But in that case I suppose andq gives a false impression that the address might not be page-aligned.
I didn't want to rely that the rip is exactly at the code_mem start and not a few bytes ahead.
I agree that hard-coding the offset would be fragile. Also, code_mem's alignment might change in the future.
In that case, how about relaxing the masking out (e.g. ~0xF instead of ~0xFFF) so as to avoid the supposed underflow? Again, this is merely a suggestion; the code would work as-is due to currently guaranteed properties on code_mem. It's not that I'm a Wine maintainer either...
Perhaps it ignores CS being set to any other value (e.g. 32-bit compat segment?) or maybe selectively use IRETQ. I hope no usermode anti-debugging/anti-cheat would rely on this...
Some might, in theory, or maybe rely on one of the numerous other neat things around this stuff. I am just not sure we should (or can) mind every bit of special cases we can think of before we know that anything relies on that.
You're right. Also Windows could break those special cases anytime in the future.
Even though implementation of exact emulation might prove unnecessary, I suppose some thorough unit tests (with todo_wine) might aid in diagnosing future incompatibility issues and pinpoint which behaviour was left unimplemented. But that's for a separate patch set, and I could, well, just submit it myself ;)