Joerg Wunsch j@uriah.heep.sax.de writes:
The Posix way is to use sigaction(), and set the SA_NOCLDWAIT flag bit. This variant is supported under FreeBSD (don't know for the other BSDs). It has the added bonus that unavailability of the automatic zombie reaping feature can be detected at compile-time (the macro is not defined then).
The fact that the macro exists doesn't mean it works. Most Linux kernels don't support SA_NOCLDWAIT, even though the symbol may be defined in the headers. I don't have the Posix standard, but at least according to the single Unix spec using SIG_IGN is just as standard as SA_NOCLDWAIT, and a quick grep of the FreeBSD sources seems to indicate that it should work. Did I miss something?
As Alexandre Julliard wrote:
The fact that the macro exists doesn't mean it works. Most Linux kernels don't support SA_NOCLDWAIT, even though the symbol may be defined in the headers.
That sounds to be a very stupid mistake to me. Why would one define something one doesn't support? But OK, that's nothing we can change.
I don't have the Posix standard, but at least according to the single Unix spec using SIG_IGN is just as standard as SA_NOCLDWAIT, and a quick grep of the FreeBSD sources seems to indicate that it should work. Did I miss something?
signal() is not part of Posix at all, IIRC. Sure, SIG_IGN has always been defined, but its effect on SIGCHLD didn't used to be defined. When i implemented the FreeBSD SA_NOCLDWAIT feature, i purposely decided to not also implement the SIG_IGN irregularity (SIG_IGN means that the /signal/ is being ignored, but never implied any other side-effect in the past), to remain backwards compatible with the BSD behaviour.
It's still that way, see the following short test program:
j@uriah 720% cat foo.c #include <signal.h> #include <unistd.h> #include <stdlib.h>
void createchild(void) { if (fork() == 0) { exit(0); } }
int main(void) { createchild(); createchild(); createchild(); sleep(1); system("ps"); sleep(2); return 0; } j@uriah 721% make foo cc -O -pipe foo.c -o foo j@uriah 722% ./foo PID TT STAT TIME COMMAND ... 0 p3 ZW+ 0:00.00 (foo) 0 p3 ZW+ 0:00.00 (foo) 0 p3 ZW+ 0:00.00 (foo) 992 p3 Ss 0:02.13 -tcsh (tcsh) 12287 p3 S+ 0:00.00 ./foo 12291 p3 S+ 0:00.00 sh -c ps 12292 p3 R+ 0:00.00 ps ...
Of course, now that i see that SUSP indeed standardized this oddness, i might change my opinion. Anyway, as demonstrated above, signal(SIGCHLD, SIG_IGN) doesn't work on FreeBSD the way you would expect it to work. So Gerald, please at least integrate the patch into the FreeBSD port.
Joerg Wunsch j@uriah.heep.sax.de writes:
#include <signal.h> #include <unistd.h> #include <stdlib.h>
void createchild(void) { if (fork() == 0) { exit(0); } }
int main(void) { createchild(); createchild(); createchild(); sleep(1); system("ps"); sleep(2); return 0; } ...
Of course, now that i see that SUSP indeed standardized this oddness, i might change my opinion. Anyway, as demonstrated above, signal(SIGCHLD, SIG_IGN) doesn't work on FreeBSD the way you would expect it to work.
Well, if you really tested with the program above then it doesn't demonstrate anything, since you don't set the SIGCHLD handler <g>
As Alexandre Julliard wrote:
Of course, now that i see that SUSP indeed standardized this oddness, i might change my opinion. Anyway, as demonstrated above, signal(SIGCHLD, SIG_IGN) doesn't work on FreeBSD the way you would expect it to work.
Well, if you really tested with the program above then it doesn't demonstrate anything, since you don't set the SIGCHLD handler <g>
OK, you're of course right. Well, half of it. ;-) SIGCHLD is being ignored by default anyway (and that's the main grieve i've got with that abuse in SVR3: they made explicitly ignoring an implicitly ignored signal have a different semantic).
Meanwhile i also took the time to review the kernel code, and read the commit logs. It seems someone indeed has implemented that SIG_IGN (mis)feature. However, the first implementation of it has been withdrawn later since it caused kernel panics, and has been redone afterwards. For that reason, FreeBSD 4.x (which is currently still the stable branch) didn't inherit that impelementation, and behaves as i initially explained, while FreeBSD 5.x behaves as you (and SUSP) expect it. (The requirement of SUSP to behave that way has been the main reason why someone implemented it at all.) Whereas SA_NOCLDWAIT has been there all the way since FreeBSD 3.0.
So i still think we should at least have the patch in the FreeBSD port... (Btw., the reason why i stumpled across it when i saw that code was that it reminded me of zombies i've once seen for a Wine application, but i didn't track it down by that time.)
Joerg Wunsch j@uriah.heep.sax.de writes:
So i still think we should at least have the patch in the FreeBSD port... (Btw., the reason why i stumpled across it when i saw that code was that it reminded me of zombies i've once seen for a Wine application, but i didn't track it down by that time.)
Of course it can be put in the FreeBSD port, but it would be nicer to have a fix in the main tree, there are probably other BSDs that have the same problem. It just needs to be done in a way that doesn't break Linux, but that should be feasible.
As Alexandre Julliard wrote:
So i still think we should at least have the patch in the FreeBSD port... (Btw., the reason why i stumpled across it when i saw that code was that it reminded me of zombies i've once seen for a Wine application, but i didn't track it down by that time.)
Of course it can be put in the FreeBSD port, but it would be nicer to have a fix in the main tree, there are probably other BSDs that have the same problem.
No idea about the other BSDs, i had to review their sources for this.
It just needs to be done in a way that doesn't break Linux, but that should be feasible.
It really surprises me why someone would implement signal(SIGCHLD, SIG_IGN) that way yet not implement SA_NOCLDWAIT in sigaction() (which is basically the same thing, only that the signal() thing is much older while sigaction() is the cleaner reimplementation).
Note that SIGCHLD is ignored by default anyway, so historically, setting it to SIG_IGN was just a no-op. It was SVR3 that introduced that `don't generate zombies' feature, and the way they did it is that you had to (explicitly) ignore an already (implicitly) ignored signal... Doesn't this sound strange? ;-) SVR3 didn't have sigaction(), so that's been their way out.
Of course, in SVR3 speak, one had to spell that SIGCLD, too. :)
I don't mind whether it's done in the generic sources or just in the FreeBSD port though.
On Tue, 15 Apr 2003, Joerg Wunsch wrote:
I don't mind whether it's done in the generic sources or just in the FreeBSD port though.
I'd _very_ much prefer the former, for a couple of reasons:
o Not all FreeBSD users use our ports collection, so both the Wine and FreeBSD projects risk getting bug reports if our sources diverge.
o In fact even I, the maintainer of the FreeBSD port, mostly use pristine upstream sources to continually check current CVS versions of Wine (and keep them buildable on FreeBSD).
o I worked very hard to decrease the number of local patches and keep the divergence minimal, and I have learned Alexandre as very interested in keep Wine portable.
Unfortunately, a general patch for this is beyond my (current) capabilities, so I'd really appreciate could the two of you agree on a general fix for this.
(Even some #ifdef __FreeBSD__ in the Wine sources certainly would be better than having FreeBSD local patches, IMHO.)
Thanks, Gerald
PS: Thanks also for the very nice analysis, Jörg!
Gerald Pfeifer pfeifer@dbai.tuwien.ac.at writes:
Unfortunately, a general patch for this is beyond my (current) capabilities, so I'd really appreciate could the two of you agree on a general fix for this.
Does this do the right thing on FreeBSD?
Index: scheduler/client.c =================================================================== RCS file: /opt/cvs-commit/wine/scheduler/client.c,v retrieving revision 1.97 diff -u -r1.97 client.c --- scheduler/client.c 1 Apr 2003 04:39:35 -0000 1.97 +++ scheduler/client.c 17 Apr 2003 02:45:46 -0000 @@ -672,11 +672,19 @@ TEB *teb = NtCurrentTeb(); int version, ret; int reply_pipe[2]; + struct sigaction sig_act; + + sig_act.sa_handler = SIG_IGN; + sig_act.sa_flags = 0; + sigemptyset( &sig_act.sa_mask );
/* ignore SIGPIPE so that we get a EPIPE error instead */ - signal( SIGPIPE, SIG_IGN ); + sigaction( SIGPIPE, &sig_act, NULL ); /* automatic child reaping to avoid zombies */ - signal( SIGCHLD, SIG_IGN ); +#ifdef SA_NOCLDWAIT + sig_act.sa_flags |= SA_NOCLDWAIT; +#endif + sigaction( SIGCHLD, &sig_act, NULL );
/* create the server->client communication pipes */ if (pipe( reply_pipe ) == -1) server_protocol_perror( "pipe" );
On Thu, 16 Apr 2003, Alexandre Julliard wrote:
Does this do the right thing on FreeBSD?
Jörg will have to respond concerning semantics, but I just rebuilt the current CVS sources plus this patch on FreeBSD 4.8 and it compiled w/o problems.
Thanks! Gerald
[Sorry for the late reply.]
As Alexandre Julliard wrote:
Unfortunately, a general patch for this is beyond my (current) capabilities, so I'd really appreciate could the two of you agree on a general fix for this.
Does this do the right thing on FreeBSD?
- struct sigaction sig_act;
- sig_act.sa_handler = SIG_IGN;
- sig_act.sa_flags = 0;
- sigemptyset( &sig_act.sa_mask );
...
+#ifdef SA_NOCLDWAIT
- sig_act.sa_flags |= SA_NOCLDWAIT;
+#endif
- sigaction( SIGCHLD, &sig_act, NULL );
I didn't test that yet, but i think that should work. If SA_NOCLDWAIT is set (in a call for SIGCHLD), sa_handler ought to be ignored anyway.
On Fri, 25 Apr 2003, Joerg Wunsch wrote:
I didn't test that yet, but i think that should work. If SA_NOCLDWAIT is set (in a call for SIGCHLD), sa_handler ought to be ignored anyway.
I'm note sure whether that's related (in it's, unfortunately, beyond my knowledge of threads and/or Wine to debug this), but...
With current CVS sources of Wine, every single Windows program I try fails with the following error:
assertion "libc_sigaction" failed: file "../../scheduler/pthread.c", line 661
which stems from the following snippet of code:
int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact) { if (!libc_sigaction) { libc_sigaction = dlsym( RTLD_NEXT, "sigaction" ); assert( libc_sigaction ); } return libc_sigaction(signum, act, oldact); }
The backtrace looks as follows:
(gdb) bt #0 0x5c06cbac in kill () from /usr/lib/libc.so.4 #1 0x5c0ae152 in abort () from /usr/lib/libc.so.4 #2 0x5c08a14f in __assert () from /usr/lib/libc.so.4 #3 0x5c16c404 in sigaction (signum=12, act=0xbfbfdce0, oldact=0xbfbfdcc8) at ../../scheduler/pthread.c:661 #4 0x5c074f09 in getcwd () from /usr/lib/libc.so.4 #5 0x5c167400 in CLIENT_InitServer () at ../../scheduler/client.c:618 #6 0x5c1685ab in process_init (argv=0xbfbff2e4) at ../../scheduler/process.c:414 #7 0x5c1689e1 in __wine_process_init (argc=2, argv=0xbfbff2e4) at ../../scheduler/process.c:585 #8 0x5c02130b in wine_init (argc=2, argv=0xbfbff2e4, error=0xbfbfee90 "", error_size=1024) at loader.c:429 #9 0x3c000569 in main (argc=2, argv=0xbfbff2e4) at main.c:32 #10 0x3c0004ae in _start ()
Gerald
As Gerald Pfeifer wrote:
assertion "libc_sigaction" failed: file "../../scheduler/pthread.c", line 661
#3 0x5c16c404 in sigaction (signum=12, act=0xbfbfdce0, oldact=0xbfbfdcc8) at ../../scheduler/pthread.c:661
Signal #12 is SIGSYS (Illegal system call). That should not be related to the SIGCHLD manipulations.
#4 0x5c074f09 in getcwd () from /usr/lib/libc.so.4
Is getcwd() supposed to cause a SIGSYS at all?
OTOH, that failing dlsym() syscall could be related to all this... I don't know why that complicated dlsym() is used, but maybe since sigaction() is now invoked directly before, the run-time loader has already loaded it from libc.so before. Maybe Alexandre can shed some light on how that's supposed to work.
On Wed, 30 Apr 2003, Joerg Wunsch wrote:
#4 0x5c074f09 in getcwd () from /usr/lib/libc.so.4
Is getcwd() supposed to cause a SIGSYS at all?
OTOH, that failing dlsym() syscall could be related to all this... I don't know why that complicated dlsym() is used, but maybe since sigaction() is now invoked directly before, the run-time loader has already loaded it from libc.so before. Maybe Alexandre can shed some light on how that's supposed to work.
It seems someone (Alexandre?) has fixed this; at least, now everything seems to work fine again on FreeBSD 4.8
Thanks!
Gerald