Sebastian Lackner sebastian@fds-team.de writes:
Changes in v2:
- Do not delete the still valid comment "Return TRUE if a user APC has been run.". I still had this deleted from an earlier version of the patchset, where the return type was changed.
If a USR1 suspend signal arrives between dequeing a system APC and sending back the result to the wineserver, a deadlock occurs. To avoid this issue this patch blocks all signals in server_select(), except while waiting or processing user APCs.
I'm guessing this is for bug 14697? It seems to me you could do that on the server side, by not suspending the thread while system APCs are outstanding. Also a test case demonstrating the bug would be a good first step.
Yes, sorry. The following lines were only in v1:
--- snip --- For https://bugs.winehq.org/show_bug.cgi?id=14697
The only disadvantage I'm aware of is that it increases the stack usage on the signal stack by sizeof(sigset_t) bytes. --- snip ---
Imho blocking USR1 on the server side is not a good solution, because: * If a user somehow sends a USR1 signal manually, it would still trigger the bug * We need code to keep track of outstanding APC results in the wineserver, and then resend USR1 after all system APCs are finished. This could easily get out of sync. Also, in server_select() we do not know if we're just in the signal handler, which would be necessary to know if an additional USR1 signal is required.
I'm not aware of a good way to write a test case for this, because it highly depends on the timing. All system APCs are usually finished within a few clock cycles. The only way to trigger this bug is to interrupt just while processing the APC, but before sending the result. There is no way to write a reliable test for that (and running the same test more than thousand times in a loop doesn't make it better).
Regards, Sebastian
2015-11-04 13:50 GMT+01:00 Alexandre Julliard julliard@winehq.org:
Sebastian Lackner sebastian@fds-team.de writes:
Changes in v2:
- Do not delete the still valid comment "Return TRUE if a user APC has
been run.".
I still had this deleted from an earlier version of the patchset,
where the
return type was changed.
If a USR1 suspend signal arrives between dequeing a system APC and
sending
back the result to the wineserver, a deadlock occurs. To avoid this issue this patch blocks all signals in server_select(), except while waiting or processing user APCs.
I'm guessing this is for bug 14697? It seems to me you could do that on the server side, by not suspending the thread while system APCs are outstanding. Also a test case demonstrating the bug would be a good first step.
-- Alexandre Julliard julliard@winehq.org
(Sending a second time, because I accidentally used a mail account not subscribed to wine-devel)
Yes, sorry. The following lines were only in v1:
--- snip --- For https://bugs.winehq.org/show_bug.cgi?id=14697
The only disadvantage I'm aware of is that it increases the stack usage on the signal stack by sizeof(sigset_t) bytes. --- snip ---
Imho blocking USR1 on the server side is not a good solution, because: * If a user somehow sends a USR1 signal manually, it would still trigger the bug * We need code to keep track of outstanding APC results in the wineserver, and then resend USR1 after all system APCs are finished. This could easily get out of sync. Also, in server_select() we do not know if we're just in the signal handler, which would be necessary to know if an additional USR1 signal is required.
I'm not aware of a good way to write a test case for this, because it highly depends on the timing. All system APCs are usually finished within a few clock cycles. The only way to trigger this bug is to interrupt just while processing the APC, but before sending the result. There is no way to write a reliable test for that (and running the same test more than thousand times in a loop doesn't make it better).
Regards, Sebastian
2015-11-04 13:50 GMT+01:00 Alexandre Julliard julliard@winehq.org:
Sebastian Lackner sebastian@fds-team.de writes:
Changes in v2:
- Do not delete the still valid comment "Return TRUE if a user APC has
been run.".
I still had this deleted from an earlier version of the patchset,
where the
return type was changed.
If a USR1 suspend signal arrives between dequeing a system APC and
sending
back the result to the wineserver, a deadlock occurs. To avoid this issue this patch blocks all signals in server_select(), except while waiting or processing user APCs.
I'm guessing this is for bug 14697? It seems to me you could do that on the server side, by not suspending the thread while system APCs are outstanding. Also a test case demonstrating the bug would be a good first step.
-- Alexandre Julliard julliard@winehq.org
Sebastian Lackner purefan@fds-team.de writes:
Yes, sorry. The following lines were only in v1:
--- snip --- For https://bugs.winehq.org/show_bug.cgi?id=14697
The only disadvantage I'm aware of is that it increases the stack usage on the signal stack by sizeof(sigset_t) bytes. --- snip ---
Imho blocking USR1 on the server side is not a good solution, because:
- If a user somehow sends a USR1 signal manually, it would still
trigger the bug
Sure, they could also send SIGSEGV and crash the app. That's not a interesting case.
- We need code to keep track of outstanding APC results in the
wineserver, and then resend USR1 after all system APCs are finished. This could easily get out of sync. Also, in server_select() we do not know if we're just in the signal handler, which would be necessary to know if an additional USR1 signal is required.
It doesn't seem that hard to me, and it would certainly be better than masking/unmasking signals several times in a heavily used request, for the sake of an extremely unlikely event.
I personally think its nice that USR1 signals do not crash the application. Other programs also use it for reloading the configuration, for example.
If performance is a concern, we could also inline wine_server_call here, then the number of pthread_setmask calls would stay approximately the same as before. Would that also be an option?
2015-11-04 14:20 GMT+01:00 Alexandre Julliard julliard@winehq.org:
Sebastian Lackner purefan@fds-team.de writes:
Yes, sorry. The following lines were only in v1:
--- snip --- For https://bugs.winehq.org/show_bug.cgi?id=14697
The only disadvantage I'm aware of is that it increases the stack usage on the signal stack by sizeof(sigset_t) bytes. --- snip ---
Imho blocking USR1 on the server side is not a good solution, because:
- If a user somehow sends a USR1 signal manually, it would still
trigger the bug
Sure, they could also send SIGSEGV and crash the app. That's not a interesting case.
- We need code to keep track of outstanding APC results in the
wineserver, and then resend USR1 after all system APCs are finished. This could easily get out of sync. Also, in server_select() we do not know if we're just in the signal handler, which would be necessary to know if an additional USR1 signal is required.
It doesn't seem that hard to me, and it would certainly be better than masking/unmasking signals several times in a heavily used request, for the sake of an extremely unlikely event.
-- Alexandre Julliard julliard@winehq.org
Sebastian Lackner sebastian@fds-team.de writes:
I personally think its nice that USR1 signals do not crash the application. Other programs also use it for reloading the configuration, for example.
You say that it's so hard to trigger the race that you can't do it in a test program even trying a thousand times, but you are concerned that users will trigger it by accident, doing something that nobody ever does? Come on...
If performance is a concern, we could also inline wine_server_call here, then the number of pthread_setmask calls would stay approximately the same as before. Would that also be an option?
It's quite ugly. It would need a convincing case that it can't be done any other way, and that it's affecting enough users to be worth it (with all due respect to Anastasius, it's not worth uglifying the code just for his specific use case).
Well, you are the development branch maintainer, but for me a code which fails only in 0.00001% of all cases is still incorrect. ;)
However, I also think that your idea to use a flag/counter in the wineserver cannot really work. Lets assume the following code sequence:
* client starts the wineserver call, and calls pthread_setmask() * server sends a SIGUSR1, because no outstading system APCs yet * client writes the request * server assumes that it is safe to dequeue the system APC now, only further USR1 signals are blocked * client waits for the reply * client restores signal mask, and receives the SIGUSR1 during the system APC
To fix that remaining race, you would have to check somehow, if the send SIGUSR1 has already arrived in the meantime. I think it only gets more ugly when going in this direction. :/
2015-11-04 14:49 GMT+01:00 Alexandre Julliard julliard@winehq.org:
Sebastian Lackner sebastian@fds-team.de writes:
I personally think its nice that USR1 signals do not crash the application. Other programs also use it for reloading the configuration, for example.
You say that it's so hard to trigger the race that you can't do it in a test program even trying a thousand times, but you are concerned that users will trigger it by accident, doing something that nobody ever does? Come on...
If performance is a concern, we could also inline wine_server_call here, then the number of pthread_setmask calls would stay approximately the same as before. Would that also be an option?
It's quite ugly. It would need a convincing case that it can't be done any other way, and that it's affecting enough users to be worth it (with all due respect to Anastasius, it's not worth uglifying the code just for his specific use case).
-- Alexandre Julliard julliard@winehq.org
Sebastian Lackner sebastian@fds-team.de writes:
Well, you are the development branch maintainer, but for me a code which fails only in 0.00001% of all cases is still incorrect. ;)
Obviously it's still incorrect and should be fixed, but it's not important enough in real usage that you can use it as an excuse for making the code worse. So you'll need to find a fix that also improves the code, and then I'll be happy to put it in ;-)