Btw, does wine ever _use_ PTRACE_SINGLESTEP for any of the things it does?
If it does, then that woulc certainly explain why my "fix" made no difference: my fix _only_ handles the case where the ptracer never actually asks for single-stepping, and single-stepping was started entirely by the program being run (ie by setting TF in eflags from within the program itself).
But if wine ends up using PTRACE_SINGESTEP because wine actually wants to single-step over some instructions, then the kernel will set the PT_DTRACE bit, and start tracing through signal handlers too. The way Wine doesn't want..
wine mixes both approches, we have (to control what's generated inside the various exception) to ptrace from our NT-kernel-like process (the ptracer) to get the context of the exception. Restart from the ptracer is done with PTRACE_SINGLESTEP.
(BTW: I also CC:ed wine-devel ML, that might be of interest to them too)
A+
On Fri, 19 Nov 2004, Eric Pouech wrote:
wine mixes both approches, we have (to control what's generated inside the various exception) to ptrace from our NT-kernel-like process (the ptracer) to get the context of the exception. Restart from the ptracer is done with PTRACE_SINGLESTEP.
Here's a new patch to try. Totally untested.
It is more careful about clearing PT_DTRACED (which by now should probably be renamed PT_PRACE_SINGLESTEP or something on x86, since we should never be lazy about this thing any more), and it may or may not help.
Pls test _together_ with the previous patch (which is already applied in the current top-of-tree for anybody with really recent kernels).
Linus
----- ===== arch/i386/kernel/ptrace.c 1.27 vs edited ===== --- 1.27/arch/i386/kernel/ptrace.c 2004-11-07 18:10:34 -08:00 +++ edited/arch/i386/kernel/ptrace.c 2004-11-19 13:18:56 -08:00 @@ -138,6 +138,26 @@ return retval; }
+static void set_singlestep(struct task_struct *child) +{ + long eflags; + + set_tsk_thread_flag(child, TIF_SINGLESTEP); + eflags = get_stack_long(child, EFL_OFFSET); + put_stack_long(child, EFL_OFFSET, eflags | TRAP_FLAG); + child->ptrace |= PT_DTRACE; +} + +static void clear_singlestep(struct task_struct *child) +{ + long eflags; + + clear_tsk_thread_flag(child, TIF_SINGLESTEP); + eflags = get_stack_long(child, EFL_OFFSET); + put_stack_long(child, EFL_OFFSET, eflags & ~TRAP_FLAG); + child->ptrace &= ~PT_DTRACE; +} + /* * Called by kernel/ptrace.c when detaching.. * @@ -145,11 +165,7 @@ */ void ptrace_disable(struct task_struct *child) { - long tmp; - - clear_tsk_thread_flag(child, TIF_SINGLESTEP); - tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG; - put_stack_long(child, EFL_OFFSET, tmp); + clear_singlestep(child); }
/* @@ -388,10 +404,8 @@ } break;
- case PTRACE_SYSCALL: /* continue and stop at next (return from) syscall */ - case PTRACE_CONT: { /* restart after signal. */ - long tmp; - + case PTRACE_SYSCALL: /* continue and stop at next (return from) syscall */ + case PTRACE_CONT: /* restart after signal. */ ret = -EIO; if ((unsigned long) data > _NSIG) break; @@ -401,56 +415,39 @@ else { clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE); } - clear_tsk_thread_flag(child, TIF_SINGLESTEP); child->exit_code = data; - /* make sure the single step bit is not set. */ - tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG; - put_stack_long(child, EFL_OFFSET,tmp); + /* make sure the single step bit is not set. */ + clear_singlestep(child); wake_up_process(child); ret = 0; break; - }
/* * make the child exit. Best I can do is send it a sigkill. * perhaps it should be put in the status that it wants to * exit. */ - case PTRACE_KILL: { - long tmp; - + case PTRACE_KILL: ret = 0; if (child->exit_state == EXIT_ZOMBIE) /* already dead */ break; child->exit_code = SIGKILL; - clear_tsk_thread_flag(child, TIF_SINGLESTEP); /* make sure the single step bit is not set. */ - tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG; - put_stack_long(child, EFL_OFFSET, tmp); + clear_singlestep(child); wake_up_process(child); break; - } - - case PTRACE_SINGLESTEP: { /* set the trap flag. */ - long tmp;
+ case PTRACE_SINGLESTEP: /* set the trap flag. */ ret = -EIO; if ((unsigned long) data > _NSIG) break; clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE); - if ((child->ptrace & PT_DTRACE) == 0) { - /* Spurious delayed TF traps may occur */ - child->ptrace |= PT_DTRACE; - } - tmp = get_stack_long(child, EFL_OFFSET) | TRAP_FLAG; - put_stack_long(child, EFL_OFFSET, tmp); - set_tsk_thread_flag(child, TIF_SINGLESTEP); + set_singlestep(child); child->exit_code = data; /* give it a chance to run. */ wake_up_process(child); ret = 0; break; - }
case PTRACE_DETACH: /* detach a process that was attached. */
On Fri, Nov 19, 2004 at 09:41:44PM +0100, Eric Pouech wrote:
Btw, does wine ever _use_ PTRACE_SINGLESTEP for any of the things it does?
If it does, then that woulc certainly explain why my "fix" made no difference: my fix _only_ handles the case where the ptracer never actually asks for single-stepping, and single-stepping was started entirely by the program being run (ie by setting TF in eflags from within the program itself).
But if wine ends up using PTRACE_SINGESTEP because wine actually wants to single-step over some instructions, then the kernel will set the PT_DTRACE bit, and start tracing through signal handlers too. The way Wine doesn't want..
wine mixes both approches, we have (to control what's generated inside the various exception) to ptrace from our NT-kernel-like process (the ptracer) to get the context of the exception. Restart from the ptracer is done with PTRACE_SINGLESTEP.
I'm getting the feeling that the question of whether to step into signal handlers is orthogonal to single-stepping; maybe it should be a separate ptrace operation.
Platforms which don't implement PTRACE_SINGLESTEP would probably appreciate this. A "single step" which stops you after setting up the signal trampoline and adjusting the PC, before executing any instructions in the handler.
On Fri, 19 Nov 2004, Daniel Jacobowitz wrote:
I'm getting the feeling that the question of whether to step into signal handlers is orthogonal to single-stepping; maybe it should be a separate ptrace operation.
I really don't see why. If a controlling process is asking for single-stepping, then it damn well should get it. It it doesn't want to single-step through a signal handler, then it could decide to just put a breakpoint on the return point (possibly by modifying the signal handler save area).
It's not like single-stepping into the signal handler in any way removes any information (while _not_ single-stepping into it clearly does).
With the patch I just posted (assuming it works for people), Wine should at least have the choice. The behaviour now should be:
- if the app sets TF on its own, it will cause a SIGTRAP which it can catch. - if the debugger sets TF with SINGLESTEP, it will single-step into a signal handler. - it the app sets TF _and_ you ptrace it, you the ptracer will see the debug event and catch it. However, doing a "continue" at that point will remove the TF flag (and always has), the app will normally then never see the trap. You can do a "signal SIGTRAP" to actually force the trap handler to tun, but that one won't actually single-step (it's a "continue" in all other senses).
It sounds like the third case is what wine wants.
Linus
I'm getting the feeling that the question of whether to step into signal handlers is orthogonal to single-stepping;
No, it's not. You only ever step into a handler when you ask to. That's already the interface.
Platforms which don't implement PTRACE_SINGLESTEP would probably appreciate this. A "single step" which stops you after setting up the signal trampoline and adjusting the PC, before executing any instructions in the handler.
That's what PTRACE_SINGLESTEP with a nonzero signal number does since it was fixed.
On Fri, Nov 19, 2004 at 01:53:38PM -0800, Linus Torvalds wrote:
On Fri, 19 Nov 2004, Daniel Jacobowitz wrote:
I'm getting the feeling that the question of whether to step into signal handlers is orthogonal to single-stepping; maybe it should be a separate ptrace operation.
I really don't see why. If a controlling process is asking for single-stepping, then it damn well should get it. It it doesn't want to single-step through a signal handler, then it could decide to just put a breakpoint on the return point (possibly by modifying the signal handler save area).
It's not like single-stepping into the signal handler in any way removes any information (while _not_ single-stepping into it clearly does).
With the patch I just posted (assuming it works for people), Wine should at least have the choice. The behaviour now should be:
- if the app sets TF on its own, it will cause a SIGTRAP which it can catch.
- if the debugger sets TF with SINGLESTEP, it will single-step into a signal handler.
- it the app sets TF _and_ you ptrace it, you the ptracer will see the debug event and catch it. However, doing a "continue" at that point will remove the TF flag (and always has), the app will normally then never see the trap. You can do a "signal SIGTRAP" to actually force the trap handler to tun, but that one won't actually single-step (it's a "continue" in all other senses).
It sounds like the third case is what wine wants.
Linus
So an app running through wine could set TF on it's own? It would be a good idea to find out what it is doing in the first place. There has to be a reason why War3 is so picky after the original change was introduced and a reason why the latest changes don't seem to fix it.
I've built a kernel 2.6.10-rc2 with the new ptrace patch. The program still says "please insert disc". I haven't been able to get winedbg to tell me anything useful -- the program never crashes anyways. So I went ahead and I captured a debug log.
the command: WINEDEBUG=+all wine war3/Warcraft\ III.exe -opengl >& war3_and_2.6.10-rc2.log
I scanned for the part right before it calls up to display the error. I found that after loading advapi32.dll, the thread 000c creates a mutex and wakes up 000a. Then 000c gets killed and right after that starts calling up the resources for the "insert disc" message box. I put the log up on my ftp, and the interesting part in a seperate file: ftp://resnet.dnip.net/
Any clue on what may be happening here? Or maybe another idea on where else to search?
Jesse
On Sat, Nov 20, 2004 at 02:49:15PM -0700, Jesse Allen wrote:
the interesting part in a seperate file: ftp://resnet.dnip.net/
I took another look at the section I found. It doesn't describe much, but it shows "000c: *signal* signal=5" for example, which are probably SIGTRAP's.
I decided to capture a log running under a kernel before the change, and see if I could find the same section again, based on the mutex name. Well I did, and found alot more tracing. The thread 000c didn't get killed either so it shows something is different. Of course under the old kernels I don't get the "insert disc" message. I put up the working version log on the ftp.
Jesse
On Fri, Nov 19, 2004 at 01:53:38PM -0800, Linus Torvalds wrote:
On Fri, 19 Nov 2004, Daniel Jacobowitz wrote:
I'm getting the feeling that the question of whether to step into signal handlers is orthogonal to single-stepping; maybe it should be a separate ptrace operation.
I really don't see why. If a controlling process is asking for single-stepping, then it damn well should get it. It it doesn't want to single-step through a signal handler, then it could decide to just put a breakpoint on the return point (possibly by modifying the signal handler save area).
I'd agree with Linus here. A signal handler is part of the application, so it should be single stepped in the same way other application code does. My original patch simply reenabled the flag before returning to userspace, and this had the consequence to single step into signal handlers too.
PS: I still cannot find the beginning this thread on lkml.
- Davide
On Sun, 21 Nov 2004, Davide Libenzi wrote:
I'd agree with Linus here. A signal handler is part of the application, so it should be single stepped in the same way other application code does. My original patch simply reenabled the flag before returning to userspace, and this had the consequence to single step into signal handlers too.
Hmmm.. I think I may have a test-case for the problem.
Lookie here:
#include <signal.h> #include <sys/mman.h>
void function(void) { printf("Copy protected: ok\n"); }
void handler(int signo) { extern char smc; smc++; }
#define TF 0x100
int main(int argc, char **argv) { void (*fnp)(void);
signal(SIGTRAP, handler); mprotect((void *)(0xfffff000 & (unsigned long)main), 4096, PROT_READ | PROT_WRITE); asm volatile("pushfl ; orl %0,(%%esp) ; popfl" : :"i" (TF):"memory"); asm volatile("pushfl ; andl %0,(%%esp) ; popfl" : :"i" (~TF):"memory"); asm volatile("\nsmc:\n\t" ".byte 0xb7\n\t" ".long function" :"=d" (fnp)); fnp(); exit(1); }
Compile it, run it, and it should say
Copy protected: ok
Now, try to "strace" it, or debug it with gdb, and see if you can repeat the behaviour.
Roland? Think of it as a challenge,
Linus
On Sun, 21 Nov 2004, Linus Torvalds wrote:
void handler(int signo) { extern char smc; smc++; }
asm volatile("\nsmc:\n\t" ".byte 0xb7\n\t" ".long function" :"=d" (fnp)); fnp();
You know you're sick, don't you? Making traps inc's to get you in the correct opcode to move function in edx->fnp, is indeed fruit of a sick mind :=)
- Davide
Linus Torvalds [email protected] writes:
Now, try to "strace" it, or debug it with gdb, and see if you can repeat the behaviour.
You'll always have hard time repeating that under strace or gdb, since a debugger uses SIGTRAP for it's own purpose and does not pass it to the program.
Andreas.
On Mon, 22 Nov 2004, Andreas Schwab wrote:
Linus Torvalds [email protected] writes:
Now, try to "strace" it, or debug it with gdb, and see if you can repeat the behaviour.
You'll always have hard time repeating that under strace or gdb, since a debugger uses SIGTRAP for it's own purpose and does not pass it to the program.
Sure. But "hard time" and "impossible" are two different things.
It _should_ be perfectly easy to debug this, by using
signal SIGTRAP
instead of "continue" when you get a SIGTRAP that wasn't due to anything you did.
But try it. It doesn't work. Why? Because the kernel will have cleared TF on the signal stack, so even when you do a "signal SIGTRAP", it will only run the trap handler _once_, even though it should run it three times (once for each instruction in between the "popfl"s.
I do think this is actually a bug, although whether it's the bug that causes problems for Wine is not clear at all. I'mm too lazy to build and boot an older kernel, but I bet that on an older kernel you _can_ do "signal SIGTRAP" three times, and it will work correctly. That would indeed make this a regression.
Linus
On Sun, 21 Nov 2004, Davide Libenzi wrote:
You know you're sick, don't you? Making traps inc's to get you in the correct opcode to move function in edx->fnp, is indeed fruit of a sick mind :=)
I prefer "creative" over "sick". It's so much less judgemental.
I thought it was a fun way to illustrate the issue, and I'm sure somebody had fun trying to figure out what the thing did.
I could have _obfuscated_ the thing if I had wanted to make it hard to follow, but instead I tried to make it as obvious as possible so that it's quite clear from reading the code what it actually does.
But imagine hitting that thing without seeing the source code. NOT a lot of fun to debug, even if the debugger _worked_ on it.
Linus
On Sun, 21 Nov 2004, Linus Torvalds wrote:
On Mon, 22 Nov 2004, Andreas Schwab wrote:
Linus Torvalds [email protected] writes:
Now, try to "strace" it, or debug it with gdb, and see if you can repeat the behaviour.
You'll always have hard time repeating that under strace or gdb, since a debugger uses SIGTRAP for it's own purpose and does not pass it to the program.
Sure. But "hard time" and "impossible" are two different things.
It _should_ be perfectly easy to debug this, by using
signal SIGTRAP
instead of "continue" when you get a SIGTRAP that wasn't due to anything you did.
But try it. It doesn't work. Why? Because the kernel will have cleared TF on the signal stack, so even when you do a "signal SIGTRAP", it will only run the trap handler _once_, even though it should run it three times (once for each instruction in between the "popfl"s.
I do think this is actually a bug, although whether it's the bug that causes problems for Wine is not clear at all. I'mm too lazy to build and boot an older kernel, but I bet that on an older kernel you _can_ do "signal SIGTRAP" three times, and it will work correctly. That would indeed make this a regression.
Hmmm ..., this is 2.4.27:
[davide@bigblue davide]$ gcc -g -o zzzzzzzz zzzzzzzz.c [davide@bigblue davide]$ ./zzzzzzzz Copy protected: ok [davide@bigblue davide]$ gdb ./zzzzzzzz GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) r Starting program: /home/davide/zzzzzzzz
Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048454 in main (argc=1, argv=0xbffff9c4) at zzzzzzzz.c:26 26 asm volatile("pushfl ; andl %0,(%%esp) ; popfl" (gdb) signal SIGTRAP Continuing with signal SIGTRAP.
Program received signal SIGSEGV, Segmentation fault. 0x00000003 in ?? () (gdb) bt #0 0x00000003 in ?? () #1 0x0804846b in smc () at zzzzzzzz.c:32 #2 0x46649b7f in __libc_start_main () from /lib/i686/libc.so.6 #3 0x08048359 in _start ()
So it seems it did not work even before, the gdb-SIGTRAP stepping. In 2.6.8 I get a straight segfault just for running it.
- Davide
On Sun, 21 Nov 2004, Davide Libenzi wrote:
So it seems it did not work even before, the gdb-SIGTRAP stepping. In 2.6.8 I get a straight segfault just for running it.
Ok, that at least means it's not a regression, although it may be that the altered behaviour is enough to make some program work/not-work depending on exactly what it is testing. My example is certainly not the only way to try to mess up a debugger.
I'm by no means 100% sure that we should encourage the kind of programming "skills" I showed with that example program, so in that sense this may not be worth worrying about. That said, I do hate the notion of having programs that are basically undebuggable, so from a QoI standpoint I'd really like to say that you can run my horrid little program under the debugger and see it work...
Linus
On Sun, 21 Nov 2004, Linus Torvalds wrote:
I'm by no means 100% sure that we should encourage the kind of programming "skills" I showed with that example program, so in that sense this may not be worth worrying about. That said, I do hate the notion of having programs that are basically undebuggable, so from a QoI standpoint I'd really like to say that you can run my horrid little program under the debugger and see it work...
Ok, how about this patch?
It does basically two things:
- it makes the x86 version of ptrace be a lot more careful about the TF bit in eflags, and in particular it never touches it _unless_ the tracer has explicitly asked for it (ie we set TF only when doing a PTRACE_SINGESTEP, and we clear it only when it has been set by us, not if it has been set by the program itself).
This patch also cleans up the codepaths by doing all the common stuff in set_singlestep()/clear_singlestep().
- It clarifies signal handling, and makes it clear that we always push the full eflags onto the signal stack, _except_ if the TF bit was set by an external ptrace user, in which case we hide it so that the tracee doesn't see it when it looks at its stack contents.
It also adds a few comments, and makes it clear that the signal handler itself is always set up with TF _clear_. But if we were single-stepped into it, we will have notified the debugger, so the debugger obviously can (and often will) decide to continue single-stepping.
IMHO, this is a nice cleanup, and it also means that I can actually debug my "program from hell":
[torvalds@evo ~]$ gdb ./a.out GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".
(gdb) run Starting program: /home/torvalds/a.out Reading symbols from shared object read from target memory...(no debugging symbols found)...done. Loaded system supplied DSO at 0xffffe000 (no debugging symbols found)...(no debugging symbols found)... Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048480 in main () (gdb) signal SIGTRAP Continuing with signal SIGTRAP.
Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048487 in main () (gdb) signal SIGTRAP Continuing with signal SIGTRAP.
Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048488 in smc () (gdb) signal SIGTRAP Continuing with signal SIGTRAP. Copy protected: ok
Program exited with code 01. (gdb)
which I think is a sign that this patch actually fixes ptrace.
Does this help with wine? I dunno. Maybe some wine people can comment..
Roland, mind take a look? I assume you have some gdb test-suite that you use to test the things?
Linus
----
===== arch/i386/kernel/ptrace.c 1.27 vs edited ===== --- 1.27/arch/i386/kernel/ptrace.c 2004-11-07 18:10:34 -08:00 +++ edited/arch/i386/kernel/ptrace.c 2004-11-21 21:34:58 -08:00 @@ -138,6 +138,28 @@ return retval; }
+static void set_singlestep(struct task_struct *child) +{ + long eflags; + + set_tsk_thread_flag(child, TIF_SINGLESTEP); + eflags = get_stack_long(child, EFL_OFFSET); + put_stack_long(child, EFL_OFFSET, eflags | TRAP_FLAG); + child->ptrace |= PT_DTRACE; +} + +static void clear_singlestep(struct task_struct *child) +{ + if (child->ptrace & PT_DTRACE) { + long eflags; + + clear_tsk_thread_flag(child, TIF_SINGLESTEP); + eflags = get_stack_long(child, EFL_OFFSET); + put_stack_long(child, EFL_OFFSET, eflags & ~TRAP_FLAG); + child->ptrace &= ~PT_DTRACE; + } +} + /* * Called by kernel/ptrace.c when detaching.. * @@ -145,11 +167,7 @@ */ void ptrace_disable(struct task_struct *child) { - long tmp; - - clear_tsk_thread_flag(child, TIF_SINGLESTEP); - tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG; - put_stack_long(child, EFL_OFFSET, tmp); + clear_singlestep(child); }
/* @@ -388,10 +406,8 @@ } break;
- case PTRACE_SYSCALL: /* continue and stop at next (return from) syscall */ - case PTRACE_CONT: { /* restart after signal. */ - long tmp; - + case PTRACE_SYSCALL: /* continue and stop at next (return from) syscall */ + case PTRACE_CONT: /* restart after signal. */ ret = -EIO; if ((unsigned long) data > _NSIG) break; @@ -401,56 +417,39 @@ else { clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE); } - clear_tsk_thread_flag(child, TIF_SINGLESTEP); child->exit_code = data; - /* make sure the single step bit is not set. */ - tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG; - put_stack_long(child, EFL_OFFSET,tmp); + /* make sure the single step bit is not set. */ + clear_singlestep(child); wake_up_process(child); ret = 0; break; - }
/* * make the child exit. Best I can do is send it a sigkill. * perhaps it should be put in the status that it wants to * exit. */ - case PTRACE_KILL: { - long tmp; - + case PTRACE_KILL: ret = 0; if (child->exit_state == EXIT_ZOMBIE) /* already dead */ break; child->exit_code = SIGKILL; - clear_tsk_thread_flag(child, TIF_SINGLESTEP); /* make sure the single step bit is not set. */ - tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG; - put_stack_long(child, EFL_OFFSET, tmp); + clear_singlestep(child); wake_up_process(child); break; - } - - case PTRACE_SINGLESTEP: { /* set the trap flag. */ - long tmp;
+ case PTRACE_SINGLESTEP: /* set the trap flag. */ ret = -EIO; if ((unsigned long) data > _NSIG) break; clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE); - if ((child->ptrace & PT_DTRACE) == 0) { - /* Spurious delayed TF traps may occur */ - child->ptrace |= PT_DTRACE; - } - tmp = get_stack_long(child, EFL_OFFSET) | TRAP_FLAG; - put_stack_long(child, EFL_OFFSET, tmp); - set_tsk_thread_flag(child, TIF_SINGLESTEP); + set_singlestep(child); child->exit_code = data; /* give it a chance to run. */ wake_up_process(child); ret = 0; break; - }
case PTRACE_DETACH: /* detach a process that was attached. */ ===== arch/i386/kernel/signal.c 1.48 vs edited ===== --- 1.48/arch/i386/kernel/signal.c 2004-11-15 00:56:24 -08:00 +++ edited/arch/i386/kernel/signal.c 2004-11-21 21:33:21 -08:00 @@ -292,10 +292,15 @@ err |= __put_user(current->thread.error_code, &sc->err); err |= __put_user(regs->eip, &sc->eip); err |= __put_user(regs->xcs, (unsigned int __user *)&sc->cs); + + /* + * Iff TF was set because the program is being single-stepped by a + * debugger, don't save that information on the signal stack.. We + * don't want debugging to change state. + */ eflags = regs->eflags; - if (current->ptrace & PT_PTRACED) { + if (current->ptrace & PT_DTRACE) eflags &= ~TF_MASK; - } err |= __put_user(eflags, &sc->eflags); err |= __put_user(regs->esp, &sc->esp_at_signal); err |= __put_user(regs->xss, (unsigned int __user *)&sc->ss); @@ -412,12 +417,17 @@ regs->xes = __USER_DS; regs->xss = __USER_DS; regs->xcs = __USER_CS; + + /* + * Clear TF when entering the signal handler, but + * notify any tracer that was single-stepping it. + * The tracer may want to single-step inside the + * handler too. + */ if (regs->eflags & TF_MASK) { - if ((current->ptrace & (PT_PTRACED | PT_DTRACE)) == (PT_PTRACED | PT_DTRACE)) { + regs->eflags &= ~TF_MASK; + if (current->ptrace & PT_DTRACE) ptrace_notify(SIGTRAP); - } else { - regs->eflags &= ~TF_MASK; - } }
#if DEBUG_SIG @@ -502,12 +512,17 @@ regs->xes = __USER_DS; regs->xss = __USER_DS; regs->xcs = __USER_CS; + + /* + * Clear TF when entering the signal handler, but + * notify any tracer that was single-stepping it. + * The tracer may want to single-step inside the + * handler too. + */ if (regs->eflags & TF_MASK) { - if (current->ptrace & PT_PTRACED) { + regs->eflags &= ~TF_MASK; + if (current->ptrace & PT_DTRACE) ptrace_notify(SIGTRAP); - } else { - regs->eflags &= ~TF_MASK; - } }
#if DEBUG_SIG
Linus Torvalds [email protected] writes:
IMHO, this is a nice cleanup, and it also means that I can actually debug my "program from hell":
Does it also work when trying to single step over it? I guess all bets are off then.
Andreas.
On Sun, 21 Nov 2004, Linus Torvalds wrote:
Ok, how about this patch?
It does basically two things:
it makes the x86 version of ptrace be a lot more careful about the TF bit in eflags, and in particular it never touches it _unless_ the tracer has explicitly asked for it (ie we set TF only when doing a PTRACE_SINGESTEP, and we clear it only when it has been set by us, not if it has been set by the program itself).
This patch also cleans up the codepaths by doing all the common stuff in set_singlestep()/clear_singlestep().
It clarifies signal handling, and makes it clear that we always push the full eflags onto the signal stack, _except_ if the TF bit was set by an external ptrace user, in which case we hide it so that the tracee doesn't see it when it looks at its stack contents.
It also adds a few comments, and makes it clear that the signal handler itself is always set up with TF _clear_. But if we were single-stepped into it, we will have notified the debugger, so the debugger obviously can (and often will) decide to continue single-stepping.
Looks like a nice cleanup. What does the test program below print for you?
- Davide
#include <stdio.h> #include <stdlib.h> #include <signal.h> #include <unistd.h> #include <errno.h> #include <sys/types.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <linux/user.h> #include <linux/unistd.h>
int main(int ac, char **av) { int i, status, res; long start, end; pid_t cpid, pid; struct user_regs_struct ur; struct sigaction sa;
sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sa.sa_handler = SIG_DFL; sigaction(SIGCHLD, &sa, NULL);
printf("nargs=%d\n", ac); if (ac == 1) goto tracer;
printf("arg=%s\n", av[1]); loop: __asm__ volatile ("int $0x80" : "=a" (res) : "0" (__NR_getpid)); goto loop; endloop: exit(0);
tracer: if ((cpid = fork()) != 0) goto parent;
printf("child=%d\n", getpid()); ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl(av[0], av[0], "child", NULL);
exit(0);
parent: start = (long) &&loop; end = (long) &&endloop;
printf("pchild=%d\n", cpid);
for (;;) { pid = wait(&status); if (pid != cpid) continue; res = WSTOPSIG(status); if (ptrace(PTRACE_GETREGS, pid, NULL, &ur)) { printf("[%d] error: ptrace(PTRACE_GETREGS, %d)\n", pid, pid); return 1; }
if (ptrace(PTRACE_SINGLESTEP, pid, NULL, res != SIGTRAP ? res: 0)) { perror("ptrace(PTRACE_SINGLESTEP)"); return 1; }
if (ur.eip >= start && ur.eip <= end) break; }
for (i = 0; i < 15; i++) { printf("waiting ...\n"); pid = wait(&status); printf("done: pid=%d status=%d\n", pid, status); if (pid != cpid) continue; res = WSTOPSIG(status); printf("sig=%d\n", res); if (ptrace(PTRACE_GETREGS, pid, NULL, &ur)) { printf("[%d] error: ptrace(PTRACE_GETREGS, %d)\n", pid, pid); return 1; }
printf("EIP=0x%08x\n", ur.eip);
if (ptrace(PTRACE_SINGLESTEP, pid, NULL, res != SIGTRAP ? res: 0)) { perror("ptrace(PTRACE_SINGLESTEP)"); return 1; } }
if (ptrace(PTRACE_CONT, cpid, NULL, SIGKILL)) { perror("ptrace(PTRACE_SINGLESTEP)"); return 1; }
return 0; }
On Mon, 22 Nov 2004, Andreas Schwab wrote:
Linus Torvalds [email protected] writes:
IMHO, this is a nice cleanup, and it also means that I can actually debug my "program from hell":
Does it also work when trying to single step over it? I guess all bets are off then.
If you single-step over the "popfl", then you need to generate the SIGTRAP's by hand too. IOW, it's _possible_ to emulate the behaviour from within the debugger, but it gets really really nasty very quickly.
I think the nastyness in that case is at least acceptable, since if you single-step, you actually _see_ what is happening, and thus you have a chance in hell of figuring it out. Practical? No. But debuggable at least in theory, which it really wasn't before.
Linus
Jesse Allen a écrit :
On Fri, Nov 19, 2004 at 01:53:38PM -0800, Linus Torvalds wrote:
On Fri, 19 Nov 2004, Daniel Jacobowitz wrote:
I'm getting the feeling that the question of whether to step into signal handlers is orthogonal to single-stepping; maybe it should be a separate ptrace operation.
I really don't see why. If a controlling process is asking for single-stepping, then it damn well should get it. It it doesn't want to single-step through a signal handler, then it could decide to just put a breakpoint on the return point (possibly by modifying the signal handler save area).
It's not like single-stepping into the signal handler in any way removes any information (while _not_ single-stepping into it clearly does).
With the patch I just posted (assuming it works for people), Wine should at least have the choice. The behaviour now should be:
- if the app sets TF on its own, it will cause a SIGTRAP which it can catch.
- if the debugger sets TF with SINGLESTEP, it will single-step into a signal handler.
- it the app sets TF _and_ you ptrace it, you the ptracer will see the debug event and catch it. However, doing a "continue" at that point will remove the TF flag (and always has), the app will normally then never see the trap. You can do a "signal SIGTRAP" to actually force the trap handler to tun, but that one won't actually single-step (it's a "continue" in all other senses).
It sounds like the third case is what wine wants.
Linus
So an app running through wine could set TF on it's own? It would be a good idea to find out what it is doing in the first place. There has to be a reason why War3 is so picky after the original change was introduced and a reason why the latest changes don't seem to fix it.
I've built a kernel 2.6.10-rc2 with the new ptrace patch. The program still says "please insert disc". I haven't been able to get winedbg to tell me anything useful -- the program never crashes anyways. So I went ahead and I captured a debug log.
the command: WINEDEBUG=+all wine war3/Warcraft\ III.exe -opengl >& war3_and_2.6.10-rc2.log
I scanned for the part right before it calls up to display the error. I found that after loading advapi32.dll, the thread 000c creates a mutex and wakes up 000a. Then 000c gets killed and right after that starts calling up the resources for the "insert disc" message box. I put the log up on my ftp, and the interesting part in a seperate file: ftp://resnet.dnip.net/
Any clue on what may be happening here? Or maybe another idea on where else to search?
Jesse
For the linux folks, here a small comparison of what happens in the working (old) case and in the non-working (new) case:
In both case
- Wine gets a first SIGTRAP (in it's sig_trap handler) + Wine converts it into a Windows exception (w-exception in short). This includes creating a context for the generic CPU registers + This w-exception is sent to the w-exception handler the program installed (this one can modifiy the all registers) o this handler touches one of the i386 debug registers + as we need to update the debug registers values (and we don't do in the signal handler return), this task is delegated to the Wine server (our central process, which is in charge of synchronisation...) > the Wine server ptrace-attach:es to the process which handled the SIGTRAP. > Wine server wait4:s on the SIGSTOP (after ptrace:attach) > modify (with ptrace) the debug registers > and resumes excution (ptrace: cont) + wine terminates the sig trap handler and resumes the execution with the modified basic registers (from the saved context), and the modified debug registers (from the Wine server round trip) - a second sig trap is generated > since the wine server is still ptrace:attached, it gets the signal.
What differs then in both execution: - in the working case, the sig trap handler is called on the client side, whereas it's never called in the non-working case. We do have a couple of protection (to avoid some misbehaving apps), but none of them get triggered. So it seems like the trap handler is not called (ugh).
A couple of notes: - as the program tested is copy protected, and as it seems that the copy protection is what causes the harm, it can be interesting to know that the programe seems to set the TF flag (some copy protection schemes directly do an "int 1", but given the exception code we get, this isn't the case). - in Windows trap handling, the TF is explictly reset before calling the windows exception handler (which is what Wine does, before calling the window exception handler). Of course the handler can set it back if it wants to continue single stepping.
HTH A+
On Mon, 22 Nov 2004, Eric Pouech wrote:
For the linux folks, here a small comparison of what happens in the working (old) case and in the non-working (new) case:
In both case
- Wine gets a first SIGTRAP (in it's sig_trap handler)
o this handler touches one of the i386 debug registers
- Wine converts it into a Windows exception (w-exception in short). This includes creating a context for the generic CPU registers
- This w-exception is sent to the w-exception handler the program installed (this one can modifiy the all registers)
- as we need to update the debug registers values (and we don't do in the signal handler return), this task is delegated to the Wine server (our central process, which is in charge of synchronisation...)
the Wine server ptrace-attach:es to the process which handledthe SIGTRAP.
Wine server wait4:s on the SIGSTOP (after ptrace:attach) modify (with ptrace) the debug registers and resumes excution (ptrace: cont)
- wine terminates the sig trap handler and resumes the execution with the modified basic registers (from the saved context), and the modified debug registers (from the Wine server round trip)
- a second sig trap is generated
since the wine server is still ptrace:attached, it gets the signal.What differs then in both execution:
- in the working case, the sig trap handler is called on the client side,
whereas it's never called in the non-working case. We do have a couple of protection (to avoid some misbehaving apps), but none of them get triggered. So it seems like the trap handler is not called (ugh).
This actually implies that the current -bk tree with my latest patch may actually fix it.
One of the things that 2.6.9 did wrong was exactly that it cleared TF too much in the ptrace interface. The current development tree is a lot more careful about that, and it fixes the horrid test-case that I used to debug it. The test-case (when run under gdb) actually does something similar to what Wine appears to do.
- in Windows trap handling, the TF is explictly reset before calling the windows
exception handler (which is what Wine does, before calling the window exception handler). Of course the handler can set it back if it wants to continue single stepping.
TF will be still set in Linux when ptrace gets access, but the ptracer can choose to clear it with PTRACE_PEEKUSR/PTRACE_POKEUSR (or with PTRACE_GETREGS/SETREGS). I assume you already do that, since I think that has been true forever (although maybe you don't: PTRACE_CONTINUE used to unconditionally clear TF, so it may be that Wine may need some minor modification to not do that - but the good news is that mod should be backwards-compatible, so it should be pretty easy).
I actually broke down and am downloading the latest source tree of wine, let's see if I can find the place you do this.
Linus
On Mon, 2004-11-22 at 13:10 -0800, Linus Torvalds wrote:
I actually broke down and am downloading the latest source tree of wine, let's see if I can find the place you do this.
The Wine source tree is organised in the same way Windows is, which is bizarre and unintuitive but we don't really have much choice. So here are the files you'd be looking for.
this is where signals are converted to SEH exceptions (w-exceptions as Eric called them):
dlls/ntdll/signal_i386.c
this is where the wineserver does context related things:
server/context_i386.c
this is where the server does ptracing:
server/ptrace.c
There may be one or two other places that are related, I only ever looked at this code to deal with some other copy protection system that wasn't happy (not kernels fault though).
thanks -mike
On Mon, 22 Nov 2004, Mike Hearn wrote:
this is where signals are converted to SEH exceptions (w-exceptions as Eric called them):
dlls/ntdll/signal_i386.c
Looks like it clears TF there already:
if (context->EFlags & 0x100) { context->EFlags &= ~0x100; /* clear single-step flag */ }
so I'll just assume it's ok.
Linus
On Sun, Nov 21, 2004 at 10:23:41PM -0800, Linus Torvalds wrote:
Ok, how about this patch?
It does basically two things:
it makes the x86 version of ptrace be a lot more careful about the TF bit in eflags, and in particular it never touches it _unless_ the tracer has explicitly asked for it (ie we set TF only when doing a PTRACE_SINGESTEP, and we clear it only when it has been set by us, not if it has been set by the program itself).
This patch also cleans up the codepaths by doing all the common stuff in set_singlestep()/clear_singlestep().
It clarifies signal handling, and makes it clear that we always push the full eflags onto the signal stack, _except_ if the TF bit was set by an external ptrace user, in which case we hide it so that the tracee doesn't see it when it looks at its stack contents.
It also adds a few comments, and makes it clear that the signal handler itself is always set up with TF _clear_. But if we were single-stepped into it, we will have notified the debugger, so the debugger obviously can (and often will) decide to continue single-stepping.
IMHO, this is a nice cleanup, and it also means that I can actually debug my "program from hell":
[torvalds@evo ~]$ gdb ./a.out GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".
(gdb) run Starting program: /home/torvalds/a.out Reading symbols from shared object read from target memory...(no debugging symbols found)...done. Loaded system supplied DSO at 0xffffe000 (no debugging symbols found)...(no debugging symbols found)... Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048480 in main () (gdb) signal SIGTRAP Continuing with signal SIGTRAP.
Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048487 in main () (gdb) signal SIGTRAP Continuing with signal SIGTRAP.
Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048488 in smc () (gdb) signal SIGTRAP Continuing with signal SIGTRAP. Copy protected: ok
Program exited with code 01. (gdb)
which I think is a sign that this patch actually fixes ptrace.
Does this help with wine? I dunno. Maybe some wine people can comment..
For the wine app in question, it does make a difference. However, there is a new problem. The program gets stuck at the loading screen at 100% CPU. It's making a call to select, timing out, and then tries select again, repeats. It's waiting for something that seems to never happen.
I've captured a log, "loop.log.bz2", and shortened to have only the relevent information after the CMS32_MUTEX is created. Look for occurances of "select()" -- I think the second one is where it starts. It's on my ftp if anyone wants to take a look. It probably can be compared to the working- version log where this doesn't occur, but it might be a pain to spot the working particular instance.
Jesse
On Mon, Nov 22, 2004 at 04:15:21PM -0700, Jesse Allen wrote:
For the wine app in question, it does make a difference. However, there is a new problem. The program gets stuck at the loading screen at 100% CPU. It's making a call to select, timing out, and then tries select again, repeats. It's waiting for something that seems to never happen.
I've captured a log, "loop.log.bz2", and shortened to have only the relevent information after the CMS32_MUTEX is created. Look for occurances of "select()" -- I think the second one is where it starts. It's on my ftp if anyone wants to take a look. It probably can be compared to the working- version log where this doesn't occur, but it might be a pain to spot the working particular instance.
Actually it's pretty obvious. In the working version, it's supposed to be getting SIGTRAPs while it's calling select(). So something's up there. But it's doing part of what it should be doing now.
Jesse Allen a écrit :
On Sun, Nov 21, 2004 at 10:23:41PM -0800, Linus Torvalds wrote:
Ok, how about this patch?
It does basically two things:
it makes the x86 version of ptrace be a lot more careful about the TF bit in eflags, and in particular it never touches it _unless_ the tracer has explicitly asked for it (ie we set TF only when doing a PTRACE_SINGESTEP, and we clear it only when it has been set by us, not if it has been set by the program itself).
This patch also cleans up the codepaths by doing all the common stuff in set_singlestep()/clear_singlestep().
It clarifies signal handling, and makes it clear that we always push the full eflags onto the signal stack, _except_ if the TF bit was set by an external ptrace user, in which case we hide it so that the tracee doesn't see it when it looks at its stack contents.
It also adds a few comments, and makes it clear that the signal handler itself is always set up with TF _clear_. But if we were single-stepped into it, we will have notified the debugger, so the debugger obviously can (and often will) decide to continue single-stepping.
IMHO, this is a nice cleanup, and it also means that I can actually debug my "program from hell":
[torvalds@evo ~]$ gdb ./a.out GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".
(gdb) run Starting program: /home/torvalds/a.out Reading symbols from shared object read from target memory...(no debugging symbols found)...done. Loaded system supplied DSO at 0xffffe000 (no debugging symbols found)...(no debugging symbols found)... Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048480 in main () (gdb) signal SIGTRAP Continuing with signal SIGTRAP.
Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048487 in main () (gdb) signal SIGTRAP Continuing with signal SIGTRAP.
Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048488 in smc () (gdb) signal SIGTRAP Continuing with signal SIGTRAP. Copy protected: ok
Program exited with code 01. (gdb)
which I think is a sign that this patch actually fixes ptrace.
Does this help with wine? I dunno. Maybe some wine people can comment..
For the wine app in question, it does make a difference. However, there is a new problem. The program gets stuck at the loading screen at 100% CPU. It's making a call to select, timing out, and then tries select again, repeats. It's waiting for something that seems to never happen.
I've captured a log, "loop.log.bz2", and shortened to have only the relevent information after the CMS32_MUTEX is created. Look for occurances of "select()" -- I think the second one is where it starts. It's on my ftp if anyone wants to take a look. It probably can be compared to the working- version log where this doesn't occur, but it might be a pain to spot the working particular instance.
Hi Jesse Any network issue on your side? I tried to traceroute resnet.disp.net, but no avail. So I cannot have a look to you newest trace.
A+
Any news about the ptrace single-stepping breakage of wine?
The application that stopped working for me is xst, the Xilinx HDL synthesizer (there's a free as in beer version at http://www.xilinx.com/xlnx/xebiz/designResources/ip_product_details.jsp?sGlo... ). I'm currently at kernel 2.6.10-ac1 (as packaged by Arjan van de Ven), and wine-20041201-1fc3winehq.
Compiling vhdl file H:/sailer/src/vhdl/xxx/vprim.vhd in Library synwork. FATAL_ERROR:Xst:Portability/export/Port_Main.h:127:1.13 - This application has discovered an exceptional condition from which it cannot recover. Process will terminate. To resolve this error, please consult the Answers Database and other online resources at http://support.xilinx.com. If you need further assistance, please open a Webcase by clicking on the "WebCase" link at http://support.xilinx.com
Thanks, Tom
On Wed, 2004-12-29 at 03:14 +0100, Thomas Sailer wrote:
Any news about the ptrace single-stepping breakage of wine?
I can't see if he CCd anybody from the archives but Jesse Allen posted a nice analysis of the remaining problem here:
http://www.winehq.org/hypermail/wine-devel/2004/12/0691.html
Here is one interesting portion of his email:
#3 signal.c - Clearing the trap flag if being traced by debugger in setup_sigcontext()
For some reason, Warcraft III doesn't work if it is cleared here. I have no idea if this TF clear is really necessary. However, everything I've read on this seems to indicate to me that this change is important, so I need to find out what this causes to the game.
The other changes Linus made are apparently good and do not cause Warcraft to regress.
thanks -mike
On Wed, 29 Dec 2004, Mike Hearn wrote:
I can't see if he CCd anybody from the archives but Jesse Allen posted a nice analysis of the remaining problem here:
http://www.winehq.org/hypermail/wine-devel/2004/12/0691.html
Ok, I don't remember the context from the Wine lists (and it's not clear from the older emails I was cc'd on), so the "#3 signal.c" change description is a bit too vague. Jesse, willing to just point to the exact diff that you need to make Warcraft work for you (and then maybe Thomas Sailer can verify whether that part is indeed the one that causes him problems).
The code in question now does
/* * Iff TF was set because the program is being single-stepped by a * debugger, don't save that information on the signal stack.. We * don't want debugging to change state. */ eflags = regs->eflags; if (current->ptrace & PT_DTRACE) eflags &= ~TF_MASK; err |= __put_user(eflags, &sc->eflags);
and I guess it originally never cleared it. True?
So does removing the conditional TF clear make everything work again?
Linus
On Wed, 2004-12-29 at 15:02 +0000, Mike Hearn wrote:
Mike,
thanks a lot for your mail. I've just tried Jesse Allen's Patch with 2.6.10-ac1, but unfortunately it doesn't seem to be enough to get xst working again. Let me know if (and how :)) I can help gather evidence.
Tom
On Wed, 29 Dec 2004 10:53:54 -0800 (PST), Linus Torvalds [email protected] wrote:
Ok, I don't remember the context from the Wine lists (and it's not clear from the older emails I was cc'd on), so the "#3 signal.c" change description is a bit too vague. Jesse, willing to just point to the exact diff that you need to make Warcraft work for you (and then maybe Thomas Sailer can verify whether that part is indeed the one that causes him problems).
I have attached the diff attached in this message for the lkml.
The code in question now does
/* * Iff TF was set because the program is being single-stepped by a * debugger, don't save that information on the signal stack.. We * don't want debugging to change state. */ eflags = regs->eflags; if (current->ptrace & PT_DTRACE) eflags &= ~TF_MASK; err |= __put_user(eflags, &sc->eflags);
and I guess it originally never cleared it. True?
Yes.
So does removing the conditional TF clear make everything work again?
Yes, as long as TIF_SINGLESTEP is not set in set_singlestep(). set_singlestep also sets PT_DTRACE, so as it now is, a call to the set_singlestep function will make this condition true clearing TF when run. So both the conditional TF clear and setting TIF_SINGLESTEP needs to be removed, like I show in the diff. Making these changes returns the code to a 2.6.8-ish resemblence.
For the wine people, I will try to upload the seh debug channel logs as soon as possible.
Jesse
On Wed, 29 Dec 2004, Jesse Allen wrote:
So does removing the conditional TF clear make everything work again?
Yes, as long as TIF_SINGLESTEP is not set in set_singlestep().
That may be a clue, if only because that makes absolutely _zero_ sense.
Setting TIF_SINGLESTEP shouldn't actually matter in this case, since we set the TRAP_FLAG in eflags by hand anyway (and that's what TIF_SINGESTEP will just re-do when returning to user space).
What TIF_SINGLESTEP _does_ do, however, is change how some other issues are reported to user space. In particular, it causes system call tracing (see arch/i386/kernel/ptrace.c: do_syscall_trace), and maybe it is _that_ that messes up Wine.
So instead of removing the setting of TIF_SINGLESTEP in set_singlestep(), can you test whether removing the _testing_ of it in do_syscall_trace() makes things happier for you? Hmm?
(Also, looking at the code, I get the feeling that set_singlestep() should _only_ set TIF_SINGLESTEP, and not set the TRAP_FLAG by hand at all, since TIF_SINGESTEP should take care of that detail regardless).
Linus
On Wed, 29 Dec 2004 20:35:44 +0100, Thomas Sailer [email protected] wrote:
Mike,
thanks a lot for your mail. I've just tried Jesse Allen's Patch with 2.6.10-ac1, but unfortunately it doesn't seem to be enough to get xst working again. Let me know if (and how :)) I can help gather evidence.
Tom
If the single-stepping changes truly broke your program, and you know < 2.6.9-rc1 works, try this attached diff against 2.6.10 in addition to the other one I made. This should effectively reverse change #2, but like I said, the current code for change #2 actually works for me now. But maybe this is not enough. You can also look here at my original report:
http://www.winehq.org/hypermail/wine-devel/2004/11/0310.html
and reverse the original patches against 2.6.9, and see if that kernel works then. Patch links:
http://linux.bkbits.net:8080/linux-2.6/[email protected] http://linux.bkbits.net:8080/linux-2.6/[email protected] http://linux.bkbits.net:8080/linux-2.6/[email protected]
Jesse
On Wed, 29 Dec 2004 12:40:53 -0700, Jesse Allen [email protected] wrote:
For the wine people, I will try to upload the seh debug channel logs as soon as possible.
I have a page with the latest logs.
On Wed, 29 Dec 2004 12:04:57 -0800 (PST), Linus Torvalds [email protected] wrote:
On Wed, 29 Dec 2004, Jesse Allen wrote:
So does removing the conditional TF clear make everything work again?
Yes, as long as TIF_SINGLESTEP is not set in set_singlestep().
That may be a clue, if only because that makes absolutely _zero_ sense.
Setting TIF_SINGLESTEP shouldn't actually matter in this case, since we set the TRAP_FLAG in eflags by hand anyway (and that's what TIF_SINGESTEP will just re-do when returning to user space).
What TIF_SINGLESTEP _does_ do, however, is change how some other issues are reported to user space. In particular, it causes system call tracing (see arch/i386/kernel/ptrace.c: do_syscall_trace), and maybe it is _that_ that messes up Wine.
So instead of removing the setting of TIF_SINGLESTEP in set_singlestep(), can you test whether removing the _testing_ of it in do_syscall_trace() makes things happier for you? Hmm?
Yes, doing that does work. But I still have to remove the conditional TF clear. Here's the diff now to show you.
Jesse
On Wed, 29 Dec 2004, Jesse Allen wrote:
So instead of removing the setting of TIF_SINGLESTEP in set_singlestep(), can you test whether removing the _testing_ of it in do_syscall_trace() makes things happier for you? Hmm?
Yes, doing that does work. But I still have to remove the conditional TF clear. Here's the diff now to show you.
Ok. That's a good piece of information.
I don't actually understand why do_syscall_trace() does that TIF_SINGLESTEP test in the first place, since the ptrace_notify() should happen as part of the _normal_ TIF_SINGLESTEP handling, afaiks. No apparent need to do it in syscall tracing, and do_syscall_trace() does that bogus fake signal sending.
Hmm.. That TF_SINGLESTEP test was added by Davide Libenzi in August, and I think Davide had some test for exactly this. Davide? Can you recall why you did this in the system call tracing code, rather than depend on the regular TIF_SINGLESTEP handling in arch/i386/kernel/signal.c: do_notify_resume()?
(Jesse's patch re-appended here in-line for Davide's "reading pleasure").
That still leaves the problem of the clearing of TF_MASK. I _appears_ that the problem is that TF was set both by Wine doing a PTRACE_SINGLESTEP (since PT_DTRACE is set) _and_ the application having TF enabled in eflags from before (since it seems to want to keep it enabled).
How about something like the attachment for the TF_MASK issue (replacing your "don't clear" code)? The comments should make the intention pretty obvious, but I have equally obviously not actually _tested_ any of this..
Linus
--- --- linux/arch/i386/kernel/ptrace.c 2004-12-29 14:10:34.000000000 -0700 +++ linux-mod/arch/i386/kernel/ptrace.c 2004-12-29 14:22:33.000000000 -0700 @@ -568,8 +568,7 @@ audit_syscall_exit(current, regs->eax); }
- if (!test_thread_flag(TIF_SYSCALL_TRACE) && - !test_thread_flag(TIF_SINGLESTEP)) + if (!test_thread_flag(TIF_SYSCALL_TRACE)) return; if (!(current->ptrace & PT_PTRACED)) return; --- linux/arch/i386/kernel/signal.c 2004-12-29 14:10:34.000000000 -0700 +++ linux-mod/arch/i386/kernel/signal.c 2004-12-29 14:23:04.000000000 -0700 @@ -299,8 +299,8 @@ * don't want debugging to change state. */ eflags = regs->eflags; - if (current->ptrace & PT_DTRACE) - eflags &= ~TF_MASK; +// if (current->ptrace & PT_DTRACE) +// eflags &= ~TF_MASK; err |= __put_user(eflags, &sc->eflags); err |= __put_user(regs->esp, &sc->esp_at_signal); err |= __put_user(regs->xss, (unsigned int __user *)&sc->ss);
On Wed, 29 Dec 2004, Linus Torvalds wrote:
On Wed, 29 Dec 2004, Jesse Allen wrote:
So instead of removing the setting of TIF_SINGLESTEP in set_singlestep(), can you test whether removing the _testing_ of it in do_syscall_trace() makes things happier for you? Hmm?
Yes, doing that does work. But I still have to remove the conditional TF clear. Here's the diff now to show you.
Ok. That's a good piece of information.
I don't actually understand why do_syscall_trace() does that TIF_SINGLESTEP test in the first place, since the ptrace_notify() should happen as part of the _normal_ TIF_SINGLESTEP handling, afaiks. No apparent need to do it in syscall tracing, and do_syscall_trace() does that bogus fake signal sending.
Hmm.. That TF_SINGLESTEP test was added by Davide Libenzi in August, and I think Davide had some test for exactly this. Davide? Can you recall why you did this in the system call tracing code, rather than depend on the regular TIF_SINGLESTEP handling in arch/i386/kernel/signal.c: do_notify_resume()?
That test went in to be able to have ptrace single step, to see even the instruction following the #int instruction (this was the target of the patch itself). I just verified that, in 2.6.8 that does not have such test anymore, the single-step-after-int capability is lost. Below I inlining a simple test program (that I already sent you time ago) to test this. In my 2.6.8 I get this output:
waiting ... done: pid=21398 status=1407 sig=5 EIP=0x080485d5 waiting ... done: pid=21398 status=1407 sig=5 EIP=0x080485da waiting ... done: pid=21398 status=1407 sig=5 EIP=0x080485df waiting ... done: pid=21398 status=1407 sig=5 EIP=0x080485d5 waiting ... done: pid=21398 status=1407 sig=5 EIP=0x080485da waiting ... done: pid=21398 status=1407 sig=5 EIP=0x080485df
in front of this code:
80485d5: b8 14 00 00 00 mov $0x14,%eax 80485da: cd 80 int $0x80 80485dc: 89 45 ec mov %eax,0xffffffec(%ebp) 80485df: eb f4 jmp 80485d5 <main+0x85>
You can see that the "mov %eax,0xffffffec(%ebp)" instruction at 0x80485dc is not seen by ptrace single step. With the test in place, it will show up again in your traces.
- Davide
#include <stdio.h> #include <stdlib.h> #include <signal.h> #include <unistd.h> #include <errno.h> #include <sys/types.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <linux/user.h> #include <linux/unistd.h>
int main(int ac, char **av) { int i, status, res; long start, end; pid_t cpid, pid; struct user_regs_struct ur; struct sigaction sa;
sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sa.sa_handler = SIG_DFL; sigaction(SIGCHLD, &sa, NULL);
printf("nargs=%d\n", ac); if (ac == 1) goto tracer;
printf("arg=%s\n", av[1]); loop: __asm__ volatile ("int $0x80" : "=a" (res) : "0" (__NR_getpid)); goto loop; endloop: exit(0);
tracer: if ((cpid = fork()) != 0) goto parent;
printf("child=%d\n", getpid()); ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl(av[0], av[0], "child", NULL);
exit(0);
parent: start = (long) &&loop; end = (long) &&endloop;
printf("pchild=%d\n", cpid);
for (;;) { pid = wait(&status); if (pid != cpid) continue; res = WSTOPSIG(status); if (ptrace(PTRACE_GETREGS, pid, NULL, &ur)) { printf("[%d] error: ptrace(PTRACE_GETREGS, %d)\n", pid, pid); return 1; }
if (ptrace(PTRACE_SINGLESTEP, pid, NULL, res != SIGTRAP ? res: 0)) { perror("ptrace(PTRACE_SINGLESTEP)"); return 1; }
if (ur.eip >= start && ur.eip <= end) break; }
for (i = 0; i < 15; i++) { printf("waiting ...\n"); pid = wait(&status); printf("done: pid=%d status=%d\n", pid, status); if (pid != cpid) continue; res = WSTOPSIG(status); printf("sig=%d\n", res); if (ptrace(PTRACE_GETREGS, pid, NULL, &ur)) { printf("[%d] error: ptrace(PTRACE_GETREGS, %d)\n", pid, pid); return 1; }
printf("EIP=0x%08x\n", ur.eip);
if (ptrace(PTRACE_SINGLESTEP, pid, NULL, res != SIGTRAP ? res: 0)) { perror("ptrace(PTRACE_SINGLESTEP)"); return 1; } }
if (ptrace(PTRACE_CONT, cpid, NULL, SIGKILL)) { perror("ptrace(PTRACE_SINGLESTEP)"); return 1; }
return 0; }
No joy with linux-2.6.10 patch-2.6.10-ac1 01-ptrace-reverse.diff sigtrap-reverse.diff
Below is the seh trace output. In the working case (2.6.8) there is no trace:seh: output at this point.
Tom
Compiling vhdl file U:/home/sailer/src/vhdl/dvbc_pcseng/vprim.vhd in Library synwork. trace:seh:EXC_RtlRaiseException code=c0000005 flags=0 addr=0x770e6151 trace:seh:EXC_RtlRaiseException info[0]=00000000 trace:seh:EXC_RtlRaiseException info[1]=72772078 trace:seh:EXC_RtlRaiseException eax=72772074 ebx=b7a01d9c ecx=7f7daa0c edx=fffffffa esi=7f7499b0 edi=7f7daa0c trace:seh:EXC_RtlRaiseException ebp=77058a70 esp=77ad2150 cs=0073 ds=007b es=007b fs=003b gs=0033 flags=00210206 trace:seh:EXC_CallHandler calling handler at 0x77134e9f code=c0000005 flags=0 trace:seh:EXC_CallHandler handler returned 1 trace:seh:EXC_CallHandler calling handler at 0x103662d2 code=c0000005 flags=0 trace:seh:EXC_CallHandler handler returned 1 trace:seh:EXC_CallHandler calling handler at 0x1034ec8e code=c0000005 flags=0 trace:seh:EXC_CallHandler handler returned 1 trace:seh:EXC_CallHandler calling handler at 0x4024a0 code=c0000005 flags=0 trace:seh:EXC_RtlUnwind code=c0000005 flags=2 trace:seh:EXC_CallHandler calling handler at 0x77ebe2e0 code=c0000005 flags=2 trace:seh:EXC_CallHandler handler returned 1 trace:seh:EXC_CallHandler calling handler at 0x77134e9f code=c0000005 flags=2 trace:seh:EXC_CallHandler handler returned 1 trace:seh:EXC_CallHandler calling handler at 0x103662d2 code=c0000005 flags=2 trace:seh:EXC_CallHandler handler returned 1 trace:seh:EXC_CallHandler calling handler at 0x1034ec8e code=c0000005 flags=2 trace:seh:EXC_CallHandler handler returned 1 FATAL_ERROR:Xst:Portability/export/Port_Main.h:127:1.13 - This application has discovered an exceptional condition from which it cannot recover. Process will terminate. To resolve this error, please consult the Answers Database and other online resources at http://support.xilinx.com. If you need further assistance, please open a Webcase by clicking on the "WebCase" link at http://support.xilinx.com
On Wed, 29 Dec 2004, Davide Libenzi wrote:
That test went in to be able to have ptrace single step, to see even the instruction following the #int instruction (this was the target of the patch itself). I just verified that, in 2.6.8 that does not have such test anymore, the single-step-after-int capability is lost.
Ahh. That's because of a separate bug: we have this silly separation of "_TIF_WORK_MASK" (everything but tracing) and "_TIF_ALLWORK_MASK" (everything), and the system call stuff takes over _TIF_SINGLESTEP for some very non-obvious reasons.
I don't see why the system-call code thinks _TIF_SINGLESTEP is special, but it certainly explains why it doesn't get handled normally.
So the updated patch would look something like the appended.
Will test whether it cleanly handles your test-case. Davide - you also added the TIF_SINGLESTEP flag to that _TIF_WORK_MASK, can you explain that?
(And yes, I know you'd sent me the test-program before, but I'm about as organized as a Performing Seal with Alzheimers..)
Linus
--- 1.23/include/asm-i386/thread_info.h 2004-11-18 23:03:11 -08:00 +++ edited/include/asm-i386/thread_info.h 2004-12-29 17:52:16 -08:00 @@ -153,7 +153,7 @@
/* work to do on interrupt/exception return */ #define _TIF_WORK_MASK \ - (0x0000FFFF & ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP)) + (0x0000FFFF & ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT)) #define _TIF_ALLWORK_MASK 0x0000FFFF /* work to do on any return to u-space */
/* --- 1.28/arch/i386/kernel/ptrace.c 2004-11-22 09:44:52 -08:00 +++ edited/arch/i386/kernel/ptrace.c 2004-12-29 17:36:41 -08:00 @@ -568,15 +568,13 @@ audit_syscall_exit(current, regs->eax); }
- if (!test_thread_flag(TIF_SYSCALL_TRACE) && - !test_thread_flag(TIF_SINGLESTEP)) + if (!test_thread_flag(TIF_SYSCALL_TRACE)) return; if (!(current->ptrace & PT_PTRACED)) return; /* the 0x80 provides a way for the tracing parent to distinguish between a syscall stop and SIGTRAP delivery */ - ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) && - !test_thread_flag(TIF_SINGLESTEP) ? 0x80 : 0)); + ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
/* * this isn't the same as continuing with a signal, but it will do
On Thu, 30 Dec 2004, Thomas Sailer wrote:
No joy with linux-2.6.10 patch-2.6.10-ac1 01-ptrace-reverse.diff sigtrap-reverse.diff
Below is the seh trace output. In the working case (2.6.8) there is no trace:seh: output at this point.
I have no idea what "seh" is in wine-speak, but it appears that your problem is something totally different, especially as none of the eflags- changes seem to matter for you. Also, in your "seh" exception register dump, you don't actually seem to have TF set in %eflags (TF is 0x0100).
Some wine person would need to inform us about what the seh exception thing means.. "code c0000005"?
Linus
On Wed, 2004-12-29 at 18:10 -0800, Linus Torvalds wrote:
I have no idea what "seh" is in wine-speak, but it appears that your
seh means structured exception handling in microsoft-speak.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/... http://www.jorgon.freeserve.co.uk/ExceptFrame.htm
Some wine person would need to inform us about what the seh exception thing means.. "code c0000005"?
c0000005 apparently means memory access violation. Looks like xst is getting confused about its memory allocations...
Tom
On Wed, 2004-12-29 at 18:10 -0800, Linus Torvalds wrote:
Some wine person would need to inform us about what the seh exception thing means.. "code c0000005"?
There's an interesting thing. Fedora Kernel 2.6.7 works for me, Fedora Kernel 2.6.8 breaks wine/xst. Interestingly, Linus 2.6.8 works. Now the biggest changes between the Fedora Kernels 2.6.7-1.494.2.2 and 2.6.8-1.521, besides the rebasing, are some execshield changes and flexible mmap. execshield cannot be the culprit as it's not in 2.6.10- ac1, but flexible mmap seems to be in 2.6.10-ac1. To me, a candidate for further scrutiny is flexmmap, although I do not claim to have a clue about linux-mm...
Tom
Another pointer towards flexible mmap is that ulimit -s unlimited makes it work under 2.6.10-ac1 too.
Tom
Thomas Sailer [email protected] wrote:
Another pointer towards flexible mmap is that ulimit -s unlimited makes it work under 2.6.10-ac1 too.
You can globally disable flex-mmap with
echo 1 > /proc/sys/vm/legacy_va_layout
Does it fix things?
On Wed, 29 Dec 2004 16:44:05 -0800 (PST), Linus Torvalds [email protected] wrote:
That still leaves the problem of the clearing of TF_MASK. I _appears_ that the problem is that TF was set both by Wine doing a PTRACE_SINGLESTEP (since PT_DTRACE is set) _and_ the application having TF enabled in eflags from before (since it seems to want to keep it enabled).
How about something like the attachment for the TF_MASK issue (replacing your "don't clear" code)? The comments should make the intention pretty obvious, but I have equally obviously not actually _tested_ any of this..
The following patch works for me. I still have to remove TIF_SINGLESTEP test. I also went ahead and tried adding the other fix on the ptrace_notify call and _TIF_WORK_MASK, but for some reason, changing _TIF_WORK_MASK seems to break the program now. This patch does fix the TF clearing problem.
Jesse
--- linux/arch/i386/kernel/ptrace.c 2004-12-29 14:10:34.000000000 -0700 +++ linux-mod/arch/i386/kernel/ptrace.c 2004-12-29 20:50:00.000000000 -0700 @@ -142,18 +142,31 @@ { long eflags;
+ /* + * Always set TIF_SINGLESTEP - this guarantees that + * we single-step system calls etc.. + */ set_tsk_thread_flag(child, TIF_SINGLESTEP); + + /* + * If TF was already set, don't do anything else + */ eflags = get_stack_long(child, EFL_OFFSET); + if (eflags & TRAP_FLAG) + return; put_stack_long(child, EFL_OFFSET, eflags | TRAP_FLAG); child->ptrace |= PT_DTRACE; }
static void clear_singlestep(struct task_struct *child) { + /* Always clear TIF_SINGLESTEP... */ + clear_tsk_thread_flag(child, TIF_SINGLESTEP); + + /* But touch TF only if it was set by us.. */ if (child->ptrace & PT_DTRACE) { long eflags;
- clear_tsk_thread_flag(child, TIF_SINGLESTEP); eflags = get_stack_long(child, EFL_OFFSET); put_stack_long(child, EFL_OFFSET, eflags & ~TRAP_FLAG); child->ptrace &= ~PT_DTRACE; @@ -568,15 +581,13 @@ audit_syscall_exit(current, regs->eax); }
- if (!test_thread_flag(TIF_SYSCALL_TRACE) && - !test_thread_flag(TIF_SINGLESTEP)) + if (!test_thread_flag(TIF_SYSCALL_TRACE)) return; if (!(current->ptrace & PT_PTRACED)) return; /* the 0x80 provides a way for the tracing parent to distinguish between a syscall stop and SIGTRAP delivery */ - ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) && - !test_thread_flag(TIF_SINGLESTEP) ? 0x80 : 0)); + ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
/* * this isn't the same as continuing with a signal, but it will do
On Wed, 29 Dec 2004, Linus Torvalds wrote:
.. no, I see what's up. System call returns _are_ special for single-stepping. I'll think about it..
Ok, I think I know what's up.
It's literally the bogus fake signal that do_syscall_trace() sends. I think the TIF_SINGLESTEP case in do_syscall_trace() should only do the ptrace_notify() and return..
Linus
On Wed, 29 Dec 2004, Linus Torvalds wrote:
Will test whether it cleanly handles your test-case. Davide - you also added the TIF_SINGLESTEP flag to that _TIF_WORK_MASK, can you explain that?
I honestly do not remember, but I think is wrong and can be removed. That's not the problem though ...
- Davide
On Wed, 29 Dec 2004, Linus Torvalds wrote:
On Wed, 29 Dec 2004, Linus Torvalds wrote:
.. no, I see what's up. System call returns _are_ special for single-stepping. I'll think about it..
Ok, I think I know what's up.
It's literally the bogus fake signal that do_syscall_trace() sends. I think the TIF_SINGLESTEP case in do_syscall_trace() should only do the ptrace_notify() and return..
I think same. My test simply let the function processing to let thru and reach the fake signal sending point.
- Davide
On Wed, 29 Dec 2004, Davide Libenzi wrote:
I think same. My test simply let the function processing to let thru and reach the fake signal sending point.
No, your test-case doesn't even send a signal at all, because your test-program just uses a PTRACE_SINGLESTEP without any "send signal" - so "exit_code" will be zero after the debuggee gets released from the "ptrace_notify()", and the suspect code will never be executed..
The problem I think I see (and which the comment alludes to) is that if the controlling process continues the debuggee with a signal, we'll be doing the wrong thing: the code in do_syscall_trace() will take that signal (in "current->exit_code") and send it as a real signal to the debuggee, but since it is debugged, it will be caught (again) by the controlling process, which apparently in the case of Wine gets very confused.
So I _think_ the problem happens for the following schenario: - wine for some reason does a PTRACE_SINGLESTEP over a system call - after the single-step, wine ends up trying to get the sub-process to enter a signal handler with ptrace( PTRACE_CONT, ... sig) - the normal ptrace path (where we trap a signal - see kernel/signal.c and the "ptrace_stop()" logic) handles this correctly, but do_syscall_trace() will do a "send_sig()" - we'll try to handle the signal when returning to the traced process - now "get_signal_to_deliver()" will invoce the ptrace logic AGAIN, and call ptrace_stop() for this fake signal, and we'll report a new event to the controlling process.
Does this make sense?
If so, we have a few options:
- very hacky one: teach all ptrace users that sometimes the signal you continue with can be reflected back at you, and you should just "cont signr" _again_.
This is a really bad option, partly because it's hard to tell when it's just a spurious reflected signal, partly because there are many ptrace users, and to a large part just because it's clearly too hacky. But it might be a valid approach for a Wine person who is used to Wine, and wants to verify whether this is indeed the schenario that triggers the problem..
- Stupid approach: mark the siginfo that we send as the fake one as being reflected, and have get_signal_to_deliver() not apply the ptrace stopping to that.
- Possibly cleaner approach: make system call tracing just use a proper SIGTRAP in the first place, and always handle all the ptrace_stop() etc cruft in kernel/signal.c like it handles all other calls.
I dunno. The final one looks fairly simple and clean, something like the following, but I'm most likely overlooking some reason why this won't work..
(And as usual, this patch has not been tested in any shape, way or form. In fact, it hasn't even seen an x86 machine, since I edited it out as a fake on my ppc64.. Somebody with more brains than me should actually try to get it to work)
Linus
---- ===== arch/i386/kernel/ptrace.c 1.28 vs edited ===== --- 1.28/arch/i386/kernel/ptrace.c 2004-11-22 09:44:52 -08:00 +++ edited/arch/i386/kernel/ptrace.c 2004-12-29 23:21:58 -08:00 @@ -559,6 +559,8 @@ __attribute__((regparm(3))) void do_syscall_trace(struct pt_regs *regs, int entryexit) { + struct siginfo info; + if (unlikely(current->audit_context)) { if (!entryexit) audit_syscall_entry(current, regs->orig_eax, @@ -573,18 +575,18 @@ return; if (!(current->ptrace & PT_PTRACED)) return; + + memset(&info, 0, sizeof(info)); + info.si_signo = SIGTRAP; + /* the 0x80 provides a way for the tracing parent to distinguish between a syscall stop and SIGTRAP delivery */ - ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) && - !test_thread_flag(TIF_SINGLESTEP) ? 0x80 : 0)); + info.si_code = SIGTRAP; + if ((current->ptrace & PT_TRACESYSGOOD) && !test_thread_flag(TIF_SINGLESTEP)) + info.si_code = SIGTRAP | 0x80; + info.si_pid = current->tgid; + info.si_uid = current->uid;
- /* - * this isn't the same as continuing with a signal, but it will do - * for normal use. strace only continues with a signal if the - * stopping signal is not SIGTRAP. -brl - */ - if (current->exit_code) { - send_sig(current->exit_code, current, 1); - current->exit_code = 0; - } + /* Send us the fakey SIGTRAP */ + send_sig_info(SIGTRAP, &info, current); }
On Wed, 2004-12-29 at 18:10 -0800, Linus Torvalds wrote:
Some wine person would need to inform us about what the seh exception thing means.. "code c0000005"?
STATUS_ACCESS_VIOLATION, or the Win32 equivalent of a segfault. It would appear that the ptrace changes are not responsible here, though if changing the kernel really does change this crash then maybe there is another issue we haven't uncovered yet.
thanks -mike
On Thu, 2004-12-30 at 11:09 +0100, Thomas Sailer wrote:
On Wed, 2004-12-29 at 20:15 -0800, Andrew Morton wrote:
You can globally disable flex-mmap with
echo 1 > /proc/sys/vm/legacy_va_layout
Does it fix things?
Haven't tried. But setarch i386 -L /usr/bin/wine ... did fix it.
Tom
Tom, does this patch against Wine help? It should do the same thing as the setarch program, so if that fixes it then this should also (if I've understood how this mechanism works of course).
Mike Hearn [email protected] Set the Linux ABI personality to get a legacy VM layout
On Wed, 29 Dec 2004, Linus Torvalds wrote:
On Wed, 29 Dec 2004, Davide Libenzi wrote:
I think same. My test simply let the function processing to let thru and reach the fake signal sending point.
No, your test-case doesn't even send a signal at all, because your test-program just uses a PTRACE_SINGLESTEP without any "send signal" - so "exit_code" will be zero after the debuggee gets released from the "ptrace_notify()", and the suspect code will never be executed..
No no, my test case has nothing to do with the extra signal sent in do_syscall_trace(). But the test I put at the time:
- if (!test_thread_flag(TIF_SYSCALL_TRACE)) + if (!test_thread_flag(TIF_SYSCALL_TRACE) && + !test_thread_flag(TIF_SINGLESTEP)) return;
will make the code to not execute the "return" (in the single step case) and to fall through the point where the signal is sent.
The problem I think I see (and which the comment alludes to) is that if the controlling process continues the debuggee with a signal, we'll be doing the wrong thing: the code in do_syscall_trace() will take that signal (in "current->exit_code") and send it as a real signal to the debuggee, but since it is debugged, it will be caught (again) by the controlling process, which apparently in the case of Wine gets very confused.
So I _think_ the problem happens for the following schenario:
- wine for some reason does a PTRACE_SINGLESTEP over a system call
- after the single-step, wine ends up trying to get the sub-process to enter a signal handler with ptrace( PTRACE_CONT, ... sig)
- the normal ptrace path (where we trap a signal - see kernel/signal.c and the "ptrace_stop()" logic) handles this correctly, but do_syscall_trace() will do a "send_sig()"
- we'll try to handle the signal when returning to the traced process
- now "get_signal_to_deliver()" will invoce the ptrace logic AGAIN, and call ptrace_stop() for this fake signal, and we'll report a new event to the controlling process.
Does this make sense?
This might explain what they were seeing, but OTOH it seems that the real cause of their problems is related to something else (according to other emails on this thread).
- Davide
On Thu, 30 Dec 2004, Davide Libenzi wrote:
This might explain what they were seeing, but OTOH it seems that the real cause of their problems is related to something else (according to other emails on this thread).
There's two different problems: the one seen by Thomas (the Xilinx FPGA synthesizer), which is apparently just due to Wine (or, more likely, the Windows app itself) depending on a certain memory layout for the stack and/or other allocations. That one I think we can consider solved, and indeed had nothing to do with TF.
The other one is the copy-protection code breaking for some game (Warcraft) for Jesse Allen, and that one is definitely TF-related.. Jesse can fix it with patches, but those patches aren't acceptable for other uses, so that's why I'm trying to find something that DTRT both for Wine and for a regular debugging session..
Linus
On Wed, 29 Dec 2004, Linus Torvalds wrote:
On Wed, 29 Dec 2004, Davide Libenzi wrote:
I think same. My test simply let the function processing to let thru and reach the fake signal sending point.
No, your test-case doesn't even send a signal at all, because your test-program just uses a PTRACE_SINGLESTEP without any "send signal" - so "exit_code" will be zero after the debuggee gets released from the "ptrace_notify()", and the suspect code will never be executed..
The problem I think I see (and which the comment alludes to) is that if the controlling process continues the debuggee with a signal, we'll be doing the wrong thing: the code in do_syscall_trace() will take that signal (in "current->exit_code") and send it as a real signal to the debuggee, but since it is debugged, it will be caught (again) by the controlling process, which apparently in the case of Wine gets very confused.
So I _think_ the problem happens for the following schenario:
- wine for some reason does a PTRACE_SINGLESTEP over a system call
- after the single-step, wine ends up trying to get the sub-process to enter a signal handler with ptrace( PTRACE_CONT, ... sig)
- the normal ptrace path (where we trap a signal - see kernel/signal.c and the "ptrace_stop()" logic) handles this correctly, but do_syscall_trace() will do a "send_sig()"
- we'll try to handle the signal when returning to the traced process
- now "get_signal_to_deliver()" will invoce the ptrace logic AGAIN, and call ptrace_stop() for this fake signal, and we'll report a new event to the controlling process.
Make sense to me. What about the one below? The problem is though, that we have two different events here. One is the single step trap and the other one is the signal. And according to the comment, strace want something different from SIGTRAP to continue with signal. So it wants the double event we are actually sending (ptrace_notify + send_sig). OTOH Wine does not seem to like the double event thingy. Hmm ...?
- Davide
--- ptrace.c.orig 2004-12-30 10:29:11.000000000 -0800 +++ ptrace.c 2004-12-30 11:11:23.000000000 -0800 @@ -573,18 +573,14 @@ return; if (!(current->ptrace & PT_PTRACED)) return; - /* the 0x80 provides a way for the tracing parent to distinguish - between a syscall stop and SIGTRAP delivery */ - ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) && - !test_thread_flag(TIF_SINGLESTEP) ? 0x80 : 0));
- /* - * this isn't the same as continuing with a signal, but it will do - * for normal use. strace only continues with a signal if the - * stopping signal is not SIGTRAP. -brl - */ if (current->exit_code) { - send_sig(current->exit_code, current, 1); + ptrace_notify(current->exit_code); current->exit_code = 0; + } else { + /* the 0x80 provides a way for the tracing parent to distinguish + between a syscall stop and SIGTRAP delivery */ + ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) && + !test_thread_flag(TIF_SINGLESTEP) ? 0x80 : 0)); } }
On Thu, 30 Dec 2004 09:59:27 -0800 (PST), Davide Libenzi [email protected] wrote:
On Wed, 29 Dec 2004, Linus Torvalds wrote:
On Wed, 29 Dec 2004, Davide Libenzi wrote:
I think same. My test simply let the function processing to let thru and reach the fake signal sending point.
No, your test-case doesn't even send a signal at all, because your test-program just uses a PTRACE_SINGLESTEP without any "send signal" - so "exit_code" will be zero after the debuggee gets released from the "ptrace_notify()", and the suspect code will never be executed..
No no, my test case has nothing to do with the extra signal sent in do_syscall_trace(). But the test I put at the time:
if (!test_thread_flag(TIF_SYSCALL_TRACE))
if (!test_thread_flag(TIF_SYSCALL_TRACE) &&
!test_thread_flag(TIF_SINGLESTEP)) return;
will make the code to not execute the "return" (in the single step case) and to fall through the point where the signal is sent.
Using the latest version of the patch on do_syscall_trace(), it still doesn't work unless I remove this test. If indeed it's supposed to fall through to receive the proper signal, (ie to single step properly after an int op), then removing it is wrong, and I won't consider it anymore. I also have to use the patch shown below, with a typo-fixed, to fix the other problem. I broke it apart from the other because we might want to consider it seperately right now.
I spent some time speaking to my brother about this. He has done his own kernel before for an embedded system. He was rather frustrated with me trying to find a reason for why a rather simple change broke my program. He told me I couldn't have it both ways. However I believe that I don't understand the linux code well enough to make that conclusion.
The problem I think I see (and which the comment alludes to) is that if the controlling process continues the debuggee with a signal, we'll be doing the wrong thing: the code in do_syscall_trace() will take that signal (in "current->exit_code") and send it as a real signal to the debuggee, but since it is debugged, it will be caught (again) by the controlling process, which apparently in the case of Wine gets very confused.
So I _think_ the problem happens for the following schenario:
- wine for some reason does a PTRACE_SINGLESTEP over a system call
- after the single-step, wine ends up trying to get the sub-process to enter a signal handler with ptrace( PTRACE_CONT, ... sig)
- the normal ptrace path (where we trap a signal - see kernel/signal.c and the "ptrace_stop()" logic) handles this correctly, but do_syscall_trace() will do a "send_sig()"
- we'll try to handle the signal when returning to the traced process
- now "get_signal_to_deliver()" will invoce the ptrace logic AGAIN, and call ptrace_stop() for this fake signal, and we'll report a new event to the controlling process.
Does this make sense?
This might explain what they were seeing, but OTOH it seems that the real cause of their problems is related to something else (according to other emails on this thread).
Actually, I don't think the vanilla kernel has the code that breaks those other wine programs. I just learned of this yesterday and it's not related. I believe it's only in fedora core 3 or -ac kernels and I use vanilla kernels.
Jesse
--- linux/arch/i386/kernel/ptrace.c 2004-12-29 14:10:34.000000000 -0700 +++ linux-mod/arch/i386/kernel/ptrace.c 2004-12-29 20:50:00.000000000 -0700 @@ -142,18 +142,31 @@ { long eflags;
+ /* + * Always set TIF_SINGLESTEP - this guarantees that + * we single-step system calls etc.. + */ set_tsk_thread_flag(child, TIF_SINGLESTEP); + + /* + * If TF was already set, don't do anything else + */ eflags = get_stack_long(child, EFL_OFFSET); + if (eflags & TRAP_FLAG) + return; put_stack_long(child, EFL_OFFSET, eflags | TRAP_FLAG); child->ptrace |= PT_DTRACE; }
static void clear_singlestep(struct task_struct *child) { + /* Always clear TIF_SINGLESTEP... */ + clear_tsk_thread_flag(child, TIF_SINGLESTEP); + + /* But touch TF only if it was set by us.. */ if (child->ptrace & PT_DTRACE) { long eflags;
- clear_tsk_thread_flag(child, TIF_SINGLESTEP); eflags = get_stack_long(child, EFL_OFFSET); put_stack_long(child, EFL_OFFSET, eflags & ~TRAP_FLAG); child->ptrace &= ~PT_DTRACE;
On Thu, 30 Dec 2004, Jesse Allen wrote:
Using the latest version of the patch on do_syscall_trace(), it still doesn't work unless I remove this test. If indeed it's supposed to fall through to receive the proper signal, (ie to single step properly after an int op), then removing it is wrong, and I won't consider it anymore. I also have to use the patch shown below, with a typo-fixed, to fix the other problem. I broke it apart from the other because we might want to consider it seperately right now.
I'm actually able to now re-create some problems with TF handling with a simple non-wine test, and I think I'm zeroing in on the reason for Wine breaking. And I think it just happened to work by luck before.
Not only do we have problems single-stepping over a system call, we _also_ have problems single-stepping over a "popf" - we'll set TF (to single-step), and then afterwards we'll _clear_ TF, even if the "popf" also was supposed to set it.
Working on a patch for this right now, I'll send something out soonish (and I'll test it on my test-case before sending it, so that it at least has some chance of working).
Linus
On Thu, 30 Dec 2004, Linus Torvalds wrote:
Working on a patch for this right now, I'll send something out soonish (and I'll test it on my test-case before sending it, so that it at least has some chance of working).
Ok, here's a patch that may or may not make Wine happier. It's a _lot_ more careful about TF handling, and in particular it's trying really really hard to make sure that a controlling process does not change the trap flag as it is modified or used by the process.
This hopefully fixes:
- single-step over "popf" should work - we won't clear TF after the popf, but instead let the popf results remain.
NOTE! I tried to make sure that it does the right thing for segments with non-zero bases, but I never actually tested that code. It's fairly obvious, but it's also fairly likely to have some silly problems. Wine may well show effects of this, although I don't know how common non-zero bases are with any kind of half-modern windows binaries.
- ptrace reporting of "eflags" register after a single-step (we used to report TF as being set because the debugger set it - even though the "native state" without debugging had it cleared).
This also hopefully means that all the conditional TF clearing games etc aren't necessary, since arch/i386/kernel/traps.c should now be taking care of hiding the debugger for most cases ("pushf" still remains, and is hard. See comment in ptrace.c part of the patch)
It's a bit more involved than I'd like, since especially the "popf" case just is pretty complex, but I'd love to hear whether it works.
NOTE NOTE NOTE! I've tested it, but only on one small test-case, so it might be totally broken in many ways. I'd love to have people who are x86 and ptrace-aware give this a second look, and I'm confident Jesse will find that it won't work, but can hopefully try to debug it a bit with this..
Linus
----- ===== arch/i386/kernel/signal.c 1.49 vs edited ===== --- 1.49/arch/i386/kernel/signal.c 2004-11-22 09:47:16 -08:00 +++ edited/arch/i386/kernel/signal.c 2004-12-30 11:40:35 -08:00 @@ -270,7 +270,6 @@ struct pt_regs *regs, unsigned long mask) { int tmp, err = 0; - unsigned long eflags;
tmp = 0; __asm__("movl %%gs,%0" : "=r"(tmp): "0"(tmp)); @@ -292,16 +291,7 @@ err |= __put_user(current->thread.error_code, &sc->err); err |= __put_user(regs->eip, &sc->eip); err |= __put_user(regs->xcs, (unsigned int __user *)&sc->cs); - - /* - * Iff TF was set because the program is being single-stepped by a - * debugger, don't save that information on the signal stack.. We - * don't want debugging to change state. - */ - eflags = regs->eflags; - if (current->ptrace & PT_DTRACE) - eflags &= ~TF_MASK; - err |= __put_user(eflags, &sc->eflags); + err |= __put_user(regs->eflags, &sc->eflags); err |= __put_user(regs->esp, &sc->esp_at_signal); err |= __put_user(regs->xss, (unsigned int __user *)&sc->ss);
@@ -424,11 +414,9 @@ * The tracer may want to single-step inside the * handler too. */ - if (regs->eflags & TF_MASK) { - regs->eflags &= ~TF_MASK; - if (current->ptrace & PT_DTRACE) - ptrace_notify(SIGTRAP); - } + regs->eflags &= ~TF_MASK; + if (test_thread_flag(TIF_SINGLESTEP)) + ptrace_notify(SIGTRAP);
#if DEBUG_SIG printk("SIG deliver (%s:%d): sp=%p pc=%p ra=%p\n", @@ -519,11 +507,9 @@ * The tracer may want to single-step inside the * handler too. */ - if (regs->eflags & TF_MASK) { - regs->eflags &= ~TF_MASK; - if (current->ptrace & PT_DTRACE) - ptrace_notify(SIGTRAP); - } + regs->eflags &= ~TF_MASK; + if (test_thread_flag(TIF_SINGLESTEP)) + ptrace_notify(SIGTRAP);
#if DEBUG_SIG printk("SIG deliver (%s:%d): sp=%p pc=%p ra=%p\n", ===== arch/i386/kernel/traps.c 1.92 vs edited ===== --- 1.92/arch/i386/kernel/traps.c 2004-12-28 11:07:48 -08:00 +++ edited/arch/i386/kernel/traps.c 2004-12-30 12:36:30 -08:00 @@ -718,8 +718,17 @@ */ if ((regs->xcs & 3) == 0) goto clear_TF_reenable; - if ((tsk->ptrace & (PT_DTRACE|PT_PTRACED)) == PT_DTRACE) - goto clear_TF; + + /* + * Was the TF flag set by a debugger? If so, clear it now, + * so that register information is correct. + */ + if (tsk->ptrace & PT_DTRACE) { + regs->eflags &= ~TF_MASK; + tsk->ptrace &= ~PT_DTRACE; + if (!tsk->ptrace & PT_DTRACE) + goto clear_TF; + } }
/* Ok, finally something we can handle */ ===== arch/i386/kernel/ptrace.c 1.28 vs edited ===== --- 1.28/arch/i386/kernel/ptrace.c 2004-11-22 09:44:52 -08:00 +++ edited/arch/i386/kernel/ptrace.c 2004-12-30 14:35:45 -08:00 @@ -42,6 +42,12 @@ */ #define EFL_OFFSET ((EFL-2)*4-sizeof(struct pt_regs))
+static inline struct pt_regs *get_child_regs(struct task_struct *task) +{ + void *stack_top = (void *)task->thread.esp0; + return stack_top - sizeof(struct pt_regs); +} + /* * this routine will get a word off of the processes privileged stack. * the offset is how far from the base addr as stored in the TSS. @@ -138,24 +144,119 @@ return retval; }
+#define LDT_SEGMENT 4 + +static unsigned long convert_eip_to_linear(struct task_struct *child, struct pt_regs *regs) +{ + unsigned long addr, seg; + + addr = regs->eip; + seg = regs->xcs & 0xffff; + if (regs->eflags & VM_MASK) { + addr = (addr & 0xffff) + (seg << 4); + return addr; + } + + /* + * We'll assume that the code segments in the GDT + * are all zero-based. That is largely true: the + * TLS segments are used for data, and the PNPBIOS + * and APM bios ones we just ignore here. + */ + if (seg & LDT_SEGMENT) { + u32 *desc; + unsigned long base; + + down(&child->mm->context.sem); + desc = child->mm->context.ldt + (seg & ~7); + base = (desc[0] >> 16) | ((desc[1] & 0xff) << 16) | (desc[1] & 0xff000000); + + /* 16-bit code segment? */ + if (!((desc[1] >> 22) & 1)) + addr &= 0xffff; + addr += base; + up(&child->mm->context.sem); + } + return addr; +} + +static inline int is_at_popf(struct task_struct *child, struct pt_regs *regs) +{ + int i, copied; + unsigned char opcode[16]; + unsigned long addr = convert_eip_to_linear(child, regs); + + copied = access_process_vm(child, addr, opcode, sizeof(opcode), 0); + for (i = 0; i < copied; i++) { + switch (opcode[i]) { + /* popf */ + case 0x9d: + return 1; + /* opcode and address size prefixes */ + case 0x66: case 0x67: + continue; + /* irrelevant prefixes (segment overrides and repeats) */ + case 0x26: case 0x2e: + case 0x36: case 0x3e: + case 0x64: case 0x65: + case 0xf0: case 0xf2: case 0xf3: + continue; + + /* + * pushf: NOTE! We should probably not let + * the user see the TF bit being set. But + * it's more pain than it's worth to avoid + * it, and a debugger could emulate this + * all in user space if it _really_ cares. + */ + case 0x9c: + default: + return 0; + } + } + return 0; +} + static void set_singlestep(struct task_struct *child) { - long eflags; + struct pt_regs *regs = get_child_regs(child);
+ /* + * Always set TIF_SINGLESTEP - this guarantees that + * we single-step system calls etc.. This will also + * cause us to set TF when returning to user mode. + */ set_tsk_thread_flag(child, TIF_SINGLESTEP); - eflags = get_stack_long(child, EFL_OFFSET); - put_stack_long(child, EFL_OFFSET, eflags | TRAP_FLAG); + + /* + * If TF was already set, don't do anything else + */ + if (regs->eflags & TRAP_FLAG) + return; + + /* Set TF on the kernel stack.. */ + regs->eflags |= TRAP_FLAG; + + /* + * ..but if TF is changed by the instruction we will trace, + * don't mark it as being "us" that set it, so that we + * won't clear it by hand later. + */ + if (is_at_popf(child, regs)) + return; + child->ptrace |= PT_DTRACE; }
static void clear_singlestep(struct task_struct *child) { - if (child->ptrace & PT_DTRACE) { - long eflags; + /* Always clear TIF_SINGLESTEP... */ + clear_tsk_thread_flag(child, TIF_SINGLESTEP);
- clear_tsk_thread_flag(child, TIF_SINGLESTEP); - eflags = get_stack_long(child, EFL_OFFSET); - put_stack_long(child, EFL_OFFSET, eflags & ~TRAP_FLAG); + /* But touch TF only if it was set by us.. */ + if (child->ptrace & PT_DTRACE) { + struct pt_regs *regs = get_child_regs(child); + regs->eflags &= ~TRAP_FLAG; child->ptrace &= ~PT_DTRACE; } } @@ -559,6 +660,8 @@ __attribute__((regparm(3))) void do_syscall_trace(struct pt_regs *regs, int entryexit) { + struct siginfo info; + if (unlikely(current->audit_context)) { if (!entryexit) audit_syscall_entry(current, regs->orig_eax, @@ -573,18 +676,18 @@ return; if (!(current->ptrace & PT_PTRACED)) return; + + memset(&info, 0, sizeof(info)); + info.si_signo = SIGTRAP; + /* the 0x80 provides a way for the tracing parent to distinguish between a syscall stop and SIGTRAP delivery */ - ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) && - !test_thread_flag(TIF_SINGLESTEP) ? 0x80 : 0)); + info.si_code = SIGTRAP; + if ((current->ptrace & PT_TRACESYSGOOD) && !test_thread_flag(TIF_SINGLESTEP)) + info.si_code = SIGTRAP | 0x80; + info.si_pid = current->tgid; + info.si_uid = current->uid;
- /* - * this isn't the same as continuing with a signal, but it will do - * for normal use. strace only continues with a signal if the - * stopping signal is not SIGTRAP. -brl - */ - if (current->exit_code) { - send_sig(current->exit_code, current, 1); - current->exit_code = 0; - } + /* Send us the fakey SIGTRAP */ + send_sig_info(SIGTRAP, &info, current); }
i386 architecture details are really not my thing, so I'm going to trust you on most of this, but this bit:
On Thu, Dec 30, 2004 at 02:46:17PM -0800, Linus Torvalds wrote:
/* the 0x80 provides a way for the tracing parent to distinguish between a syscall stop and SIGTRAP delivery */
- ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) &&
!test_thread_flag(TIF_SINGLESTEP) ? 0x80 : 0));
- info.si_code = SIGTRAP;
- if ((current->ptrace & PT_TRACESYSGOOD) && !test_thread_flag(TIF_SINGLESTEP))
info.si_code = SIGTRAP | 0x80;
- info.si_pid = current->tgid;
- info.si_uid = current->uid;
- /*
* this isn't the same as continuing with a signal, but it will do
* for normal use. strace only continues with a signal if the
* stopping signal is not SIGTRAP. -brl
*/
- if (current->exit_code) {
send_sig(current->exit_code, current, 1);
current->exit_code = 0;
- }
- /* Send us the fakey SIGTRAP */
- send_sig_info(SIGTRAP, &info, current);
}
does not look right to me. Before, we'd get an 0x80|SIGTRAP result from wait. Now, you've moved the 0x80 to live only inside the siginfo. This is accessible to the debugger via ptrace, but only very recently (late 2.5.x). So this will probably break users of PT_TRACESYSGOOD.
Linus Torvalds [email protected] writes:
It's a bit more involved than I'd like, since especially the "popf" case just is pretty complex, but I'd love to hear whether it works.
NOTE NOTE NOTE! I've tested it, but only on one small test-case, so it might be totally broken in many ways. I'd love to have people who are x86 and ptrace-aware give this a second look, and I'm confident Jesse will find that it won't work, but can hopefully try to debug it a bit with this..
Just looking at all this complexiy and thinking about making it work on x86-64 too doesn't exactly give a good feeling in my spine.
Not to belittle your archivement Linus but it all looks very overengineered to me.
I think such complex instruction emulation games will be hard to maintain and there are very surely bugs in so much subtle code.
Can someone repeat again what was wrong with the old ptrace semantics before the initial change that caused all these complex changes? It seemed to work well for years. How about we just go back to the old state, revert all the recent ptrace changes and skip all that?
e.g. reporting TF after popf in ptrace doesnt really seem like a big issue to me that is worth fixing with that much code. It is more an unimportant corner case, isnt it? Same thing with forcing TF after popf. I bet most debuggers in existence get this special case wrong and so far the world hasn't collapsed because of that.
I would love to skip this all on x86-64, but I would prefer to not make the behaviour incompatible to i386.
-Andi
On Thu, 30 Dec 2004, Daniel Jacobowitz wrote:
does not look right to me. Before, we'd get an 0x80|SIGTRAP result from wait. Now, you've moved the 0x80 to live only inside the siginfo. This is accessible to the debugger via ptrace, but only very recently (late 2.5.x). So this will probably break users of PT_TRACESYSGOOD.
I don't see how to easily emulate the old behaviour 100% - see ptrace_notify. We always sent the signal SIGTRAP to the process, and then set "exit_code" tp have the 0x80 marker by calling ptrace_stop() by hand. However, if we make it a real signal (which we need to do to get the continue semantics right), we no longer have that out-of-band info available to us.
We could make "get_signal_to_deliver()" pass in some other exit_code to ptrace_stop() than just signr. It would have to pick it up from the siginfo structure, but then we'd have to make sure that _other_ signal users do that properly..
Patches/suggestions welcome.
Linus
On Fri, 31 Dec 2004, Andi Kleen wrote:
Just looking at all this complexiy and thinking about making it work on x86-64 too doesn't exactly give a good feeling in my spine.
Not to belittle your archivement Linus but it all looks very overengineered to me.
Ehh, do you have any _alternatives_?
I think such complex instruction emulation games will be hard to maintain and there are very surely bugs in so much subtle code.
There is no complexity anywhere, and we don't actually emulate any instructions at all. The only thing we do is to check _whether_ the instruction is a "popf" - we let the CPU do all the work, we just say "ok, the instruction will set TF, so we should not touch it afterwards.
Can someone repeat again what was wrong with the old ptrace semantics before the initial change that caused all these complex changes? It seemed to work well for years. How about we just go back to the old state, revert all the recent ptrace changes and skip all that?
Let me count the ways that were wrong before the changes: - you couldn't debug any code that set TF. Really. ptrace would totally destroy the TF state in the controlled process, so it would do something totally different when debugged. - you couldn't even debug signal handlers, because they were _really_ hard to get into unless you knew where they were and put a breakpoint on them. - you couldn't see the instruction after a system call. - ptrace returned bogus TF state after a single-step
I would love to skip this all on x86-64, but I would prefer to not make the behaviour incompatible to i386.
I suspect all the code can be shared. In fact, the change to send a SIGTRAP directly rather than play around with "ptrace_notify()" etc is likely totally architecture-independent apart from the calling convention magic, so all of "do_syscall_trace()" could probably be moved into kernel/ptrace.c.
The _only_ real complexity is actually following the silly LDT descriptors, and we actually do that (badly) in another place: the AMD "prefetch" check does exactly the same thing except it seems to get a few details wrong (looks like it cannot handle 16-bit code), and only works for the current process.
I assume you have that same prefetch thing on x86-64 already, so if anything, you could look at my replacement and see if it would be workable to do the prefetch thing too..
IOW, none of the issues involved are new.
Linus
On Thu, 30 Dec 2004 14:46:17 -0800 (PST), Linus Torvalds [email protected] wrote:
Ok, here's a patch that may or may not make Wine happier. It's a _lot_ more careful about TF handling, and in particular it's trying really really hard to make sure that a controlling process does not change the trap flag as it is modified or used by the process.
This hopefully fixes:
single-step over "popf" should work - we won't clear TF after the popf, but instead let the popf results remain.
NOTE! I tried to make sure that it does the right thing for segments with non-zero bases, but I never actually tested that code. It's fairly obvious, but it's also fairly likely to have some silly problems. Wine may well show effects of this, although I don't know how common non-zero bases are with any kind of half-modern windows binaries.
ptrace reporting of "eflags" register after a single-step (we used to report TF as being set because the debugger set it - even though the "native state" without debugging had it cleared).
This also hopefully means that all the conditional TF clearing games etc aren't necessary, since arch/i386/kernel/traps.c should now be taking care of hiding the debugger for most cases ("pushf" still remains, and is hard. See comment in ptrace.c part of the patch)
It's a bit more involved than I'd like, since especially the "popf" case just is pretty complex, but I'd love to hear whether it works.
NOTE NOTE NOTE! I've tested it, but only on one small test-case, so it might be totally broken in many ways. I'd love to have people who are x86 and ptrace-aware give this a second look, and I'm confident Jesse will find that it won't work, but can hopefully try to debug it a bit with this..
Well I tried this patch and it works. I captured a log showing the signals and eflags again when running the program. I compared it to the working version. There are differences, but none seem to be the scenario TF was not set when it should have been. Both log files are just about the same size too. I captured a second log in a row, and compared the previous. Again there are differences, so there is some unavoidable randomness.
Since I cannot spot any issue, the patch looks good. Are there any other test cases?
Jesse
On Thu, 30 Dec 2004, Jesse Allen wrote:
Well I tried this patch and it works.
Goodie. Are there other known problems with silly copy-protection schemes? It migth be worth testing.
However:
Since I cannot spot any issue, the patch looks good. Are there any other test cases?
Yes. It seems I broke "strace" with it. Probably the difference in system call trace reporting that Dan Jacobowitz already pointed out.
Now, that should be easily handled by just separating out the cases of system call tracing and debug trap handling, and using the old silly code for system calls. I'd prefer a cleaner approach, but that seems to be the sane thing to do for now.
Linus
On Thu, Dec 30, 2004 at 03:17:01PM -0800, Linus Torvalds wrote:
On Thu, 30 Dec 2004, Daniel Jacobowitz wrote:
does not look right to me. Before, we'd get an 0x80|SIGTRAP result from wait. Now, you've moved the 0x80 to live only inside the siginfo. This is accessible to the debugger via ptrace, but only very recently (late 2.5.x). So this will probably break users of PT_TRACESYSGOOD.
I don't see how to easily emulate the old behaviour 100% - see ptrace_notify. We always sent the signal SIGTRAP to the process, and then set "exit_code" tp have the 0x80 marker by calling ptrace_stop() by hand. However, if we make it a real signal (which we need to do to get the continue semantics right), we no longer have that out-of-band info available to us.
We could make "get_signal_to_deliver()" pass in some other exit_code to ptrace_stop() than just signr. It would have to pick it up from the siginfo structure, but then we'd have to make sure that _other_ signal users do that properly..
Patches/suggestions welcome.
Well, you put SIGTRAP|0x80 in si_code. Coincidentally, 0x80 is SI_KERNEL. So testing for SI_KERNEL | 0x80 is probably OK in the signal path, since most of its other arbitrary values would be either negative or not include SI_KERNEL.
On Thu, Dec 30, 2004 at 09:05:01PM -0800, Linus Torvalds wrote:
On Thu, 30 Dec 2004, Jesse Allen wrote:
Well I tried this patch and it works.
Goodie. Are there other known problems with silly copy-protection schemes? It migth be worth testing.
However:
Since I cannot spot any issue, the patch looks good. Are there any other test cases?
Yes. It seems I broke "strace" with it. Probably the difference in system call trace reporting that Dan Jacobowitz already pointed out.
Now, that should be easily handled by just separating out the cases of system call tracing and debug trap handling, and using the old silly code for system calls. I'd prefer a cleaner approach, but that seems to be the sane thing to do for now.
Strace doesn't use PTRACE_SETOPTIONS as far as I can tell... so it must be something different.
On Fri, 31 Dec 2004, Daniel Jacobowitz wrote:
Well, you put SIGTRAP|0x80 in si_code. Coincidentally, 0x80 is SI_KERNEL. So testing for SI_KERNEL | 0x80 is probably OK in the signal path, since most of its other arbitrary values would be either negative or not include SI_KERNEL.
Somebody would need to validate that no user can fool the kernel into doing something stupid...
That said, I think I solved the problem a different way: the TIF_SINGLESTEP code really fundamentally doesn't want to have _any_ of that SIGTRAP | 0x80 nonsense in its path, it actually wants to look like a debug trap - which sets si_code to TRAP_BRKPT. Which makes more sense anyway.
So I looked at just sharing the code with the debug trap handler, and the result is appended. strace works, as does all the TF tests I've thrown at it, and the code actually looks better anyway (the old do_debug code looks like it got the EIP wrong in VM86 mode, for example, this just cleans that up too). Just use a common "send_sigtrap()" routine.
Does this look saner?
Linus
---- ===== include/asm-i386/ptrace.h 1.7 vs edited ===== --- 1.7/include/asm-i386/ptrace.h 2004-09-03 16:55:27 -07:00 +++ edited/include/asm-i386/ptrace.h 2004-12-30 21:27:39 -08:00 @@ -55,6 +55,7 @@ #define PTRACE_SET_THREAD_AREA 26
#ifdef __KERNEL__ +extern void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs, int error_code); #define user_mode(regs) ((VM_MASK & (regs)->eflags) || (3 & (regs)->xcs)) #define instruction_pointer(regs) ((regs)->eip) #if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER) ===== arch/i386/kernel/signal.c 1.49 vs edited ===== --- 1.49/arch/i386/kernel/signal.c 2004-11-22 09:47:16 -08:00 +++ edited/arch/i386/kernel/signal.c 2004-12-30 11:40:35 -08:00 @@ -270,7 +270,6 @@ struct pt_regs *regs, unsigned long mask) { int tmp, err = 0; - unsigned long eflags;
tmp = 0; __asm__("movl %%gs,%0" : "=r"(tmp): "0"(tmp)); @@ -292,16 +291,7 @@ err |= __put_user(current->thread.error_code, &sc->err); err |= __put_user(regs->eip, &sc->eip); err |= __put_user(regs->xcs, (unsigned int __user *)&sc->cs); - - /* - * Iff TF was set because the program is being single-stepped by a - * debugger, don't save that information on the signal stack.. We - * don't want debugging to change state. - */ - eflags = regs->eflags; - if (current->ptrace & PT_DTRACE) - eflags &= ~TF_MASK; - err |= __put_user(eflags, &sc->eflags); + err |= __put_user(regs->eflags, &sc->eflags); err |= __put_user(regs->esp, &sc->esp_at_signal); err |= __put_user(regs->xss, (unsigned int __user *)&sc->ss);
@@ -424,11 +414,9 @@ * The tracer may want to single-step inside the * handler too. */ - if (regs->eflags & TF_MASK) { - regs->eflags &= ~TF_MASK; - if (current->ptrace & PT_DTRACE) - ptrace_notify(SIGTRAP); - } + regs->eflags &= ~TF_MASK; + if (test_thread_flag(TIF_SINGLESTEP)) + ptrace_notify(SIGTRAP);
#if DEBUG_SIG printk("SIG deliver (%s:%d): sp=%p pc=%p ra=%p\n", @@ -519,11 +507,9 @@ * The tracer may want to single-step inside the * handler too. */ - if (regs->eflags & TF_MASK) { - regs->eflags &= ~TF_MASK; - if (current->ptrace & PT_DTRACE) - ptrace_notify(SIGTRAP); - } + regs->eflags &= ~TF_MASK; + if (test_thread_flag(TIF_SINGLESTEP)) + ptrace_notify(SIGTRAP);
#if DEBUG_SIG printk("SIG deliver (%s:%d): sp=%p pc=%p ra=%p\n", ===== arch/i386/kernel/traps.c 1.92 vs edited ===== --- 1.92/arch/i386/kernel/traps.c 2004-12-28 11:07:48 -08:00 +++ edited/arch/i386/kernel/traps.c 2004-12-30 21:26:14 -08:00 @@ -718,23 +718,21 @@ */ if ((regs->xcs & 3) == 0) goto clear_TF_reenable; - if ((tsk->ptrace & (PT_DTRACE|PT_PTRACED)) == PT_DTRACE) - goto clear_TF; + + /* + * Was the TF flag set by a debugger? If so, clear it now, + * so that register information is correct. + */ + if (tsk->ptrace & PT_DTRACE) { + regs->eflags &= ~TF_MASK; + tsk->ptrace &= ~PT_DTRACE; + if (!tsk->ptrace & PT_DTRACE) + goto clear_TF; + } }
/* Ok, finally something we can handle */ - tsk->thread.trap_no = 1; - tsk->thread.error_code = error_code; - info.si_signo = SIGTRAP; - info.si_errno = 0; - info.si_code = TRAP_BRKPT; - - /* If this is a kernel mode trap, save the user PC on entry to - * the kernel, that's what the debugger can make sense of. - */ - info.si_addr = ((regs->xcs & 3) == 0) ? (void __user *)tsk->thread.eip - : (void __user *)regs->eip; - force_sig_info(SIGTRAP, &info, tsk); + send_sigtrap(tsk, regs, error_code);
/* Disable additional traps. They'll be re-enabled when * the signal is delivered. ===== arch/i386/kernel/ptrace.c 1.28 vs edited ===== --- 1.28/arch/i386/kernel/ptrace.c 2004-11-22 09:44:52 -08:00 +++ edited/arch/i386/kernel/ptrace.c 2004-12-30 21:26:16 -08:00 @@ -42,6 +42,12 @@ */ #define EFL_OFFSET ((EFL-2)*4-sizeof(struct pt_regs))
+static inline struct pt_regs *get_child_regs(struct task_struct *task) +{ + void *stack_top = (void *)task->thread.esp0; + return stack_top - sizeof(struct pt_regs); +} + /* * this routine will get a word off of the processes privileged stack. * the offset is how far from the base addr as stored in the TSS. @@ -138,24 +144,119 @@ return retval; }
+#define LDT_SEGMENT 4 + +static unsigned long convert_eip_to_linear(struct task_struct *child, struct pt_regs *regs) +{ + unsigned long addr, seg; + + addr = regs->eip; + seg = regs->xcs & 0xffff; + if (regs->eflags & VM_MASK) { + addr = (addr & 0xffff) + (seg << 4); + return addr; + } + + /* + * We'll assume that the code segments in the GDT + * are all zero-based. That is largely true: the + * TLS segments are used for data, and the PNPBIOS + * and APM bios ones we just ignore here. + */ + if (seg & LDT_SEGMENT) { + u32 *desc; + unsigned long base; + + down(&child->mm->context.sem); + desc = child->mm->context.ldt + (seg & ~7); + base = (desc[0] >> 16) | ((desc[1] & 0xff) << 16) | (desc[1] & 0xff000000); + + /* 16-bit code segment? */ + if (!((desc[1] >> 22) & 1)) + addr &= 0xffff; + addr += base; + up(&child->mm->context.sem); + } + return addr; +} + +static inline int is_at_popf(struct task_struct *child, struct pt_regs *regs) +{ + int i, copied; + unsigned char opcode[16]; + unsigned long addr = convert_eip_to_linear(child, regs); + + copied = access_process_vm(child, addr, opcode, sizeof(opcode), 0); + for (i = 0; i < copied; i++) { + switch (opcode[i]) { + /* popf */ + case 0x9d: + return 1; + /* opcode and address size prefixes */ + case 0x66: case 0x67: + continue; + /* irrelevant prefixes (segment overrides and repeats) */ + case 0x26: case 0x2e: + case 0x36: case 0x3e: + case 0x64: case 0x65: + case 0xf0: case 0xf2: case 0xf3: + continue; + + /* + * pushf: NOTE! We should probably not let + * the user see the TF bit being set. But + * it's more pain than it's worth to avoid + * it, and a debugger could emulate this + * all in user space if it _really_ cares. + */ + case 0x9c: + default: + return 0; + } + } + return 0; +} + static void set_singlestep(struct task_struct *child) { - long eflags; + struct pt_regs *regs = get_child_regs(child);
+ /* + * Always set TIF_SINGLESTEP - this guarantees that + * we single-step system calls etc.. This will also + * cause us to set TF when returning to user mode. + */ set_tsk_thread_flag(child, TIF_SINGLESTEP); - eflags = get_stack_long(child, EFL_OFFSET); - put_stack_long(child, EFL_OFFSET, eflags | TRAP_FLAG); + + /* + * If TF was already set, don't do anything else + */ + if (regs->eflags & TRAP_FLAG) + return; + + /* Set TF on the kernel stack.. */ + regs->eflags |= TRAP_FLAG; + + /* + * ..but if TF is changed by the instruction we will trace, + * don't mark it as being "us" that set it, so that we + * won't clear it by hand later. + */ + if (is_at_popf(child, regs)) + return; + child->ptrace |= PT_DTRACE; }
static void clear_singlestep(struct task_struct *child) { - if (child->ptrace & PT_DTRACE) { - long eflags; + /* Always clear TIF_SINGLESTEP... */ + clear_tsk_thread_flag(child, TIF_SINGLESTEP);
- clear_tsk_thread_flag(child, TIF_SINGLESTEP); - eflags = get_stack_long(child, EFL_OFFSET); - put_stack_long(child, EFL_OFFSET, eflags & ~TRAP_FLAG); + /* But touch TF only if it was set by us.. */ + if (child->ptrace & PT_DTRACE) { + struct pt_regs *regs = get_child_regs(child); + regs->eflags &= ~TRAP_FLAG; child->ptrace &= ~PT_DTRACE; } } @@ -553,6 +654,29 @@ return ret; }
+void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs, int error_code) +{ + struct siginfo info; + + tsk->thread.trap_no = 1; + tsk->thread.error_code = error_code; + + memset(&info, 0, sizeof(info)); + info.si_signo = SIGTRAP; + info.si_code = TRAP_BRKPT; + + /* + * If this is a kernel mode trap, save the user PC on entry to + * the kernel, that's what the debugger can make sense of. + */ + info.si_addr = user_mode(regs) ? + (void __user *) regs->eip : + (void __user *)tsk->thread.eip; + + /* Send us the fakey SIGTRAP */ + force_sig_info(SIGTRAP, &info, tsk); +} + /* notification of system call entry/exit * - triggered by current->work.syscall_trace */ @@ -568,15 +692,16 @@ audit_syscall_exit(current, regs->eax); }
- if (!test_thread_flag(TIF_SYSCALL_TRACE) && - !test_thread_flag(TIF_SINGLESTEP)) - return; if (!(current->ptrace & PT_PTRACED)) return; - /* the 0x80 provides a way for the tracing parent to distinguish - between a syscall stop and SIGTRAP delivery */ - ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) && - !test_thread_flag(TIF_SINGLESTEP) ? 0x80 : 0)); + + /* Fake a debug trap */ + if (test_thread_flag(TIF_SINGLESTEP)) + send_sigtrap(current, regs, 0); + + if (!test_thread_flag(TIF_SYSCALL_TRACE)) + return; + ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
/* * this isn't the same as continuing with a signal, but it will do
On Thu, 30 Dec 2004 21:47:42 -0800 (PST), Linus Torvalds [email protected] wrote:
So I looked at just sharing the code with the debug trap handler, and the result is appended. strace works, as does all the TF tests I've thrown at it, and the code actually looks better anyway (the old do_debug code looks like it got the EIP wrong in VM86 mode, for example, this just cleans that up too). Just use a common "send_sigtrap()" routine.
Does this look saner?
Yeah. I've tested and this one works. I don't have any other copy protection schemes that are broken. Of the cd-rom based, older safedisc still worked, and the newer one needs a special device driver that wine cannot load properly yet. It's more likely if there is a problem with one that it's a problem with wine at this point.
Jesse
On Thu, Dec 30, 2004 at 04:38:21PM -0800, Linus Torvalds wrote:
Can someone repeat again what was wrong with the old ptrace semantics before the initial change that caused all these complex changes? It seemed to work well for years. How about we just go back to the old state, revert all the recent ptrace changes and skip all that?
Let me count the ways that were wrong before the changes:
- you couldn't debug any code that set TF. Really. ptrace would totally destroy the TF state in the controlled process, so it would do something totally different when debugged.
Well, tough. I assume people who play with TF themselves know how to handle it without debuggers. Really adding instruction parsing for such a corner case seems like extreme overkill to me.
I agree it is not nice, but is it really worth all that complexity to hide it?
- you couldn't even debug signal handlers, because they were _really_ hard to get into unless you knew where they were and put a breakpoint on them.
Ok I see this as being a problem. But I bet it could be fixed much simpler without doing all this complicated and likely-to-be-buggy popf parsing you added.
- you couldn't see the instruction after a system call.
Are you sure?
- ptrace returned bogus TF state after a single-step
I am sure all debuggers in existence deal with that (and they will need to continue doing so because there are so many old kernels around)
descriptors, and we actually do that (badly) in another place: the AMD "prefetch" check does exactly the same thing except it seems to get a few details wrong (looks like it cannot handle 16-bit code), and only works for the current process.
Yes, it was intentional to simplify it. 16bit code is not supposed to use prefetches (and even if they do the probability of faulting is pretty small) Also we needed several iterations to get all the subtle bugs out (and I bet there are some issues left)
-Andi
On Donnerstag 30 Dezember 2004 14.06, you wrote:
Tom, does this patch against Wine help? It should do the same thing as the setarch program, so if that fixes it then this should also (if I've understood how this mechanism works of course).
No this doesn't work. The decision which address space layout to use is done in arch/i386/mm/mmap.c:arch_pick_mmap_layout, and this function is called by the elf loader in fs/binfmt_elf.c:load_elf_binary, i.e. the decision which address space layout to use for the current wine process is already done long time before your personality syscall takes effect.
I hoped there was some ELF section magic to turn this off (like execshield), but there doesn't seem to be.
Tom
On Fri, 2004-12-31 at 14:13 +0100, Thomas Sailer wrote:
No this doesn't work. The decision which address space layout to use is done in arch/i386/mm/mmap.c:arch_pick_mmap_layout, and this function is called by the elf loader in fs/binfmt_elf.c:load_elf_binary, i.e. the decision which address space layout to use for the current wine process is already done long time before your personality syscall takes effect.
I hoped there was some ELF section magic to turn this off (like execshield), but there doesn't seem to be.
So it has to be done before an exec then? I had assumed changing the personality would affect all mmaps from that point onwards, too bad.
We do some execs as part of the Wine startup code so it shouldn't be too much of an issue. Anyway, all the setarch program does is do this syscall then exec the program so it shouldn't be too hard to do the equivalent in Wine.
What about this patch?
--- libs/wine/config.c (revision 14) +++ libs/wine/config.c (local) @@ -35,6 +35,10 @@ #endif #include "wine/library.h"
+#ifdef HAVE_SYSCALL_H +#include <syscall.h> +#endif + static const char server_config_dir[] = "/.wine"; /* config dir relative to $HOME */ static const char server_root_prefix[] = "/tmp/.wine-"; /* prefix for server root dir */ static const char server_dir_prefix[] = "/server-"; /* prefix for server dir */ @@ -289,8 +293,13 @@ static void preloader_exec( char **argv, new_argv = xmalloc( (last_arg - argv + 2) * sizeof(*argv) ); memcpy( new_argv + 1, argv, (last_arg - argv + 1) * sizeof(*argv) ); new_argv[0] = full_name; + + /* set the abi personality */ + syscall(136, 0x8 /* PER_LINUX32 */ | 0x0200000 /* ADDR_COMPAT_LAYOUT */); + if (envp) execve( full_name, new_argv, envp ); else execv( full_name, new_argv ); + free( new_argv ); free( full_name ); return;
On Thu, Dec 30, 2004 at 09:47:42PM -0800, Linus Torvalds wrote:
So I looked at just sharing the code with the debug trap handler, and the result is appended. strace works, as does all the TF tests I've thrown at it, and the code actually looks better anyway (the old do_debug code looks like it got the EIP wrong in VM86 mode, for example, this just cleans that up too). Just use a common "send_sigtrap()" routine.
Does this look saner?
Lots, I like it. The syscall trap will always be delivered before the single-step trap, right, because signal delivery won't run until we return to userspace?
On Fri, 31 Dec 2004, Andi Kleen wrote:
- you couldn't even debug signal handlers, because they were _really_ hard to get into unless you knew where they were and put a breakpoint on them.
Ok I see this as being a problem. But I bet it could be fixed much simpler without doing all this complicated and likely-to-be-buggy popf parsing you added.
I don't think that the Wine problem resolution is due to the POPF instruction handling. Basically Linus patch does a nice cleanup plus POPF handling, so maybe the patch can be split.
- you couldn't see the instruction after a system call.
Are you sure?
Yes, this was true with 2.4. Than it has been fixed some time ago. But handling that revealed a fragile handling of ptrace event delivery we had in do_syscall_trace(). Part of the Linus patch tries to solve the fact that on one side strace wants things to happen in a certain way, way that seem to break Wine. I think Linus cleanup of the ptrace event delivery can get strace, Wine and single-step-after-syscall right, w/out POPF handling. Then you guys can flame each other over the POPF handling ;)
- Davide
On Fri, 31 Dec 2004, Andi Kleen wrote:
- you couldn't even debug signal handlers, because they were _really_ hard to get into unless you knew where they were and put a breakpoint on them.
Ok I see this as being a problem. But I bet it could be fixed much simpler without doing all this complicated and likely-to-be-buggy popf parsing you added.
And that is _exactly_ what we did.
Guess what broke Wine?
Uhhuh.
Linus
On Fri, 31 Dec 2004, Daniel Jacobowitz wrote:
Lots, I like it. The syscall trap will always be delivered before the single-step trap, right, because signal delivery won't run until we return to userspace?
Yes. Although I've not actually tested it.
Before, it used to show up as one event, and basically the "0x80" marker got lost, so effectively the "system call exit" part would have got lost. Now, it _may_ DTRT, with the caveat that the system call ptrace_notify() thing still has the same problem with restarting-with-a-signal.
That's basically a "don't do that then", and is the status quo, of course, so this is at least not a regression. It's still pretty ugly, but apparently nobody really cares ;)
Linus
On Fri, 31 Dec 2004, Davide Libenzi wrote:
I don't think that the Wine problem resolution is due to the POPF instruction handling. Basically Linus patch does a nice cleanup plus POPF handling, so maybe the patch can be split.
The popf part is very nice in that it allows you to single-step and debug this thing.
The fact is, I can't debug Wine. The code-base is just too alien for me, so to debug this thing I needed to come up with a silly example of TF usage, and try to debug _that_ instead as if it were something unknown (ie debugging by knowing what the program does is a no-no, since that would have defeated the whole exercise).
And handlign "popf" correctly really was the only sane way to debug it, anything else would never have worked in a real-life debugging situation. It's easy enough to say "well, just do it by hand", but that's not practical when you debug with "si 1000" to try to pinpoint the behaviour a bit.
And clearly my debuggability exercise seem to have worked, since the end result apparently ended up doing the right thing for Wine.
This is why debuggability is important. I realize that people may think I'm inconsistent (since I abhor kernel debuggers), but while _I_ abhor debuggers, I also think that the primary objective of an operating system is to make easy things easy, and hard things possible, so while I don't much like debuggers, I consider it a fundamental failure if the kernel doesn't have proper support for them.
So I think it's worth splitting out the "popf" part of the patch, but even if that one doesn't actually matter for Warcraft, I'd put it in just so that we have a state where we're _supposed_ to be able to debug things with TF in them. Just having the mental expectation that things like that should work is important - otherwise we'll eventually end up having some other subtle problem with TF handling.
Linus
On Fri, 31 Dec 2004 09:30:04 -0800 (PST), Linus Torvalds [email protected] wrote:
On Fri, 31 Dec 2004, Davide Libenzi wrote:
I don't think that the Wine problem resolution is due to the POPF instruction handling. Basically Linus patch does a nice cleanup plus POPF handling, so maybe the patch can be split.
The popf part is very nice in that it allows you to single-step and debug this thing.
The fact is, I can't debug Wine. The code-base is just too alien for me, so to debug this thing I needed to come up with a silly example of TF usage, and try to debug _that_ instead as if it were something unknown (ie debugging by knowing what the program does is a no-no, since that would have defeated the whole exercise).
And handlign "popf" correctly really was the only sane way to debug it, anything else would never have worked in a real-life debugging situation. It's easy enough to say "well, just do it by hand", but that's not practical when you debug with "si 1000" to try to pinpoint the behaviour a bit.
And clearly my debuggability exercise seem to have worked, since the end result apparently ended up doing the right thing for Wine.
I honestly think there may have been no other way. There was a reason why I was very vague at first. I could not pin-point an exact location of failure. Just grepping a good log shows over 500,000 calls to RaiseException. trap_handler ran over 300,000 times and as many TF clears. My brother told me to set watches or break points to see if I could find something wrong, but I simply told him there is too much going on. The only thing I could think that can be done is to mentally find debugging scenarios and hope to find where it could be breaking. But the only people that could have thought of a scenario are people that know linux well enough what the kernel is doing in the first place.
This is why debuggability is important. I realize that people may think I'm inconsistent (since I abhor kernel debuggers), but while _I_ abhor debuggers, I also think that the primary objective of an operating system is to make easy things easy, and hard things possible, so while I don't much like debuggers, I consider it a fundamental failure if the kernel doesn't have proper support for them.
I think that copy protection routines are particually nasty. They purposely make debugging hard (again see above, no sane program would be like that). And the program's reason for failing is not the reason at all -- "please insert disc" -- the disc is already in there! So they don't say the real reason why it failed, leaving the user totally hopelessly lost on what they should do. It's really hard to file a bug report on that alone! Had I not placed my guess on ptrace early on, this problem may have gone undiscovered for a long time. I have checked transgaming and they seem to be not aware about this, but it would have been a rude awakening for them when they find that when most distros had updated to 2.6.9, that all the SecuRom protected games silently broke, and they would have had a heck of a time debugging them to find the reason, I'm sure, even with the specs on it! Though who knows if cedega is affected because their code-base is diverging, and I'm sure their copy protection support is very different.
So I think it's worth splitting out the "popf" part of the patch, but even if that one doesn't actually matter for Warcraft, I'd put it in just so that we have a state where we're _supposed_ to be able to debug things with TF in them. Just having the mental expectation that things like that should work is important - otherwise we'll eventually end up having some other subtle problem with TF handling.
Linus
Sure. I've tried the game and it doesn't require is_at_popf(). I thought it did because I thought it was required for handling TF correctly, but maybe this is another scenario. I think it's a good idea to keep it in too. I actually don't care about debugging the copy protection, I just want correct behavoir. If doing all this has pointed out these other issues too, then it's all well worth it, because next time it will be /easier/ and we won't have subtle problems.
Jesse
On Fri, Dec 31, 2004 at 09:19:48AM -0800, Linus Torvalds wrote:
On Fri, 31 Dec 2004, Daniel Jacobowitz wrote:
Lots, I like it. The syscall trap will always be delivered before the single-step trap, right, because signal delivery won't run until we return to userspace?
Yes. Although I've not actually tested it.
Before, it used to show up as one event, and basically the "0x80" marker got lost, so effectively the "system call exit" part would have got lost. Now, it _may_ DTRT, with the caveat that the system call ptrace_notify() thing still has the same problem with restarting-with-a-signal.
That's basically a "don't do that then", and is the status quo, of course, so this is at least not a regression. It's still pretty ugly, but apparently nobody really cares ;)
Yes. At some point, I'd like to make that an error - if you want to restart with a signal, don't do it from the notification points where it doesn't make sense (exit, fork, vfork-done, syscall). Send a signal by hand, and then resume, and if you want to fudge the siginfo you can do that when stopped in the signal delivery path.
Context: this is not about ptrace stuff, but rather Thomas Sailors original problem which seems to be flex-mmap related.
On Fri, 2004-12-31 at 16:51 +0100, Thomas Sailer wrote:
On Freitag 31 Dezember 2004 14.31, Mike Hearn wrote:
What about this patch?
This works now. Happy new year...
I'm afraid Alexandre has decided not to apply this patch (the ABI personality syscall). His reasoning is as follows:
- There are a million kernel patches out there which use the ABI personality system to break stuff, which means that realistically we have no idea what it does
- Therefore, using it would be a big support problem as it may even switch off stuff (new features etc) we want, so this system doesn't really scale
- It'd be a lot better to provide a "disable this mmap feature/behaviour" flag for each thing that changes it, or *even* better to make it opt-in as having to change the sources to make old apps work again kind of defeats the point of backwards compatibility (ie, old apps should continue to work as-is)
- Another possibility would be to create a new mmap API that lets us ask for exactly what we want, instead of second-guessing the kernel. I don't know exactly what sort of an API Alexandre has in mind here, perhaps he could describe it.
- So rather than apply the patch, we have to figure out why flex-mmap is breaking this program and add yet more magic VM code to stop it from happening :(
Could you upload a +relay,+tid,+seh,+msgbox trace somewhere please? Of course if you could investigate it yourself that'd be the best thing.
I wasn't sure whether to CC some kernel people or not, but it seems from previous threads that the ABI personality syscall system is meant for people like us who are trying to keep "legacy" binaries working. Unfortunately this sort of unbreak-me-please flag isn't really what we need/want from the kernel.
Also a precise description of what flex-mmap does would be good. Google wasn't too informative, best I could get was that it means mmap allocates from the "top" of the address space down. But where is the top exactly?
thanks -mike
Mike Hearn [email protected] wrote:
Also a precise description of what flex-mmap does would be good. Google wasn't too informative, best I could get was that it means mmap allocates from the "top" of the address space down. But where is the top exactly?
Ingo has described it thus:
before:
0x08000000 ... binary code 0x08xxxxxx ... brk area 0x40000000 ... start of mmap, new mmaps go after old ones 0xbfxxxxxx ... stack
after:
0x08000000 ... binary code 0x08xxxxxx ... brk area 0xbfxxxxxx ... _end_ of all mmaps, new mmaps go below old ones 0xbfyyyyyy ... stack
the 'after' layout guarantees that brk area (malloc()) can grow unlimited and mmap() can grow unlimited - they will meet somewhere inbetween when almost all of the VM is used up. [the 'top' of the mmaps in the 'after' layout is constrained by the stack ulimit - the stack must still fit and we never allocate into the stack's yet unallocated and growable hole.]
with the 'before' layout we've got 900 MB for brk() and 1.9GB for mmaps() - a rigid limit.
On Tue, 2005-01-04 at 21:15 +0000, Mike Hearn wrote:
Context: this is not about ptrace stuff, but rather Thomas Sailors
s/Sailor/Sailer/
I'm afraid Alexandre has decided not to apply this patch (the ABI personality syscall). His reasoning is as follows:
Quite understandably.
Could you upload a +relay,+tid,+seh,+msgbox trace somewhere please? Of course if you could investigate it yourself that'd be the best thing.
http://www.baycom.org/~tom/wine/wine.xst.broken.relay.tid.seh.msgbox.gz http://www.baycom.org/~tom/wine/wine.xst.working.relay.tid.seh.msgbox.gz
I used 2.6.10-ac1 and wine-20041201-1fc3winehq. The second log (which is huge!, ~250MBytes compressed, compression ratio roughly 1:100) is with setarch i386 -L.
Thanks, Tom
* Thomas Sailer [email protected] wrote:
I'm afraid Alexandre has decided not to apply this patch (the ABI personality syscall). His reasoning is as follows:
Quite understandably.
another workaround to switch off flex-mmap is to set the stack ulimit to 'unlimited':
saturn:~> cat /proc/self/maps | tail -7 b7db3000-b7db4000 r--p 00cfd000 03:41 3735973 /usr/lib/locale/locale-archive b7db4000-b7de1000 r--p 00ccc000 03:41 3735973 /usr/lib/locale/locale-archive b7de1000-b7de7000 r--p 00cc3000 03:41 3735973 /usr/lib/locale/locale-archive b7de7000-b7fe7000 r--p 00000000 03:41 3735973 /usr/lib/locale/locale-archive b7fe7000-b7fe8000 rw-p b7fe7000 00:00 0 bff2c000-c0000000 rw-p bff2c000 00:00 0 ffffe000-fffff000 ---p 00000000 00:00 0
saturn:~> ulimit -s unlimited
saturn:~> cat /proc/self/maps | tail -7 42188000-4218a000 rw-p 00014000 03:41 3433982 /lib/ld-2.3.3.so 4218c000-422aa000 r-xp 00000000 03:41 3434006 /lib/tls/libc-2.3.3.so 422aa000-422ac000 r--p 0011d000 03:41 3434006 /lib/tls/libc-2.3.3.so 422ac000-422ae000 rw-p 0011f000 03:41 3434006 /lib/tls/libc-2.3.3.so 422ae000-422b0000 rw-p 422ae000 00:00 0 bfea0000-c0000000 rw-p bfea0000 00:00 0 ffffe000-fffff000 ---p 00000000 00:00 0
e.g. SuSE defaults to an unlimited stack so flex-mmap is effectively disabled there.
To set the VM to legacy, for all apps, set /proc/sys/vm/legacy_va_layout to 1.
Ingo
Mike Hearn [email protected] writes:
- Another possibility would be to create a new mmap API that lets us ask for exactly what we want, instead of second-guessing the kernel. I don't know exactly what sort of an API Alexandre has in mind here, perhaps he could describe it.
Probably the easiest would be to have a way for an app to specify the mmap range it wants. So instead of having the kernel try to guess from brk and stack ulimit, both of which are meaningless for Win32 apps, we could set the range from "end of win32 exe" to 0x7ff0000. This would also avoid the need to preallocate everything above 0x80000000 that we currently do and that plays havoc with address space limits.
[ Reading just long long thread (actually from gmane.comp.emulators.wine.devel) ]
[email protected] Linus Torvalds [email protected]:
/*
* Was the TF flag set by a debugger? If so, clear it now,
* so that register information is correct.
*/
if (tsk->ptrace & PT_DTRACE) {
regs->eflags &= ~TF_MASK;
tsk->ptrace &= ~PT_DTRACE;
=========================
if (!tsk->ptrace & PT_DTRACE)
=======================
goto clear_TF;
}}
Perhaps, I'm stupid.
But is there something strange on that test of tsk->ptrace variable?
Before that PT_DTRACE was cleared from that same tsk->ptrace variable.
/ Kari Hurtta