At 10:15 03/04/03 -0800, Alexandre Julliard wrote:
Adam Gundy arg@cyberscience.com writes:
anything that uses SuspendThread is inherently racy and unpleasant. Even the Microsoft documentation says "don't use this!". Of course, they ignore themselves and do it (twice) during MFC thread creation, but...
We aren't currently adding _any_ race condition that isn't already there - the wineserver sends a signal to the client process in the current code - who knows when that will be delivered? there could be a big gap between the server sending the signal, and the signal being processed.
There may be a delay, but the thread is still effectively suspended since it's not running any user-mode code. The signal will get delivered on the next switch to that thread, so we still get the correct behavior. If the signal is sent in the client the suspended thread can still run arbitrary amounts of code while the server believes it is already suspended.
you can't reason like this - it all falls apart with SMP. Both threads can happily be running at the same time - we are using clone() after all.
I can currently produce this race (without my patch applied) by adding a "Sleep( 1000 )" to the USR1 signal handler in ntdll/signal_i386.c. The wineserver is convinced that the thread is suspended, but it isn't (sleeping is a bad example - assume that a context switch occurred, or heavy processing happened before the signal was delivered).
this 'race' doesn't have ANY effect, because suspending a thread is not a precise operation - it doesn't matter whether 1 or 1000 instructions are executed by the thread before it stops. Windows threads don't give guarantees about thread suspension (unlike pthreads).
to summarize:
a) there is already a race condition in the current code - it _probably_ only triggers on SMP or heavily loaded UP. b) my patch won't make things any worse c) the race doesn't matter because Windows threads don't guarantee what happens when a thread is suspended (read Microsoft's docs - they practically say "don't use this call" for this very reason).
Seeya, Adam -- Real Programmers don't comment their code. If it was hard to write, it should be hard to read, and even harder to modify. These are all my own opinions.
Adam Gundy arg@cyberscience.com writes:
I can currently produce this race (without my patch applied) by adding a "Sleep( 1000 )" to the USR1 signal handler in ntdll/signal_i386.c. The wineserver is convinced that the thread is suspended, but it isn't (sleeping is a bad example - assume that a context switch occurred, or heavy processing happened before the signal was delivered).
Not sure what you are trying to prove here; if the thread is inside the SIGUSR1 handler then it is already suspended, it doesn't matter if you sleep or not. What matters is how much code it runs before getting the signal.
this 'race' doesn't have ANY effect, because suspending a thread is not a precise operation - it doesn't matter whether 1 or 1000 instructions are executed by the thread before it stops.
It certainly matters if the thread never stops, which can easily happen with your patch, since anything can happen to the thread that was supposed to send the SIGUSR1.
At 08:09 04/04/03 -0800, Alexandre Julliard wrote:
Adam Gundy arg@cyberscience.com writes:
I can currently produce this race (without my patch applied) by adding a "Sleep( 1000 )" to the USR1 signal handler in ntdll/signal_i386.c. The wineserver is convinced that the thread is suspended, but it isn't (sleeping is a bad example - assume that a context switch occurred, or heavy processing happened before the signal was delivered).
Not sure what you are trying to prove here; if the thread is inside
just that the thread to be suspended may run for some time before it stops. signal delivery is NOT guaranteed to be immediate on SMP.
it's not guaranteed to be immediate on UP either - harder to trigger, but suppose other signals are pending for the thread to be suspended; they could be handled first...
since we block SIGUSR1 delivery during client/server interaction , the thread to be suspended could wait a (potentially) long time before it is suspended.
the SIGUSR1 handler then it is already suspended, it doesn't matter if you sleep or not. What matters is how much code it runs before getting the signal.
but it doesn't matter how much code it runs. suspending a thread under Windows is NOT guaranteed to stop it immediately - just some time fairly soon. To quote Microsoft's documentation:
This function [SuspendThread] is primarily designed for use by debuggers. It is not intended to be used for thread synchronization.
this 'race' doesn't have ANY effect, because suspending a thread is not a precise operation - it doesn't matter whether 1 or 1000 instructions are executed by the thread before it stops.
It certainly matters if the thread never stops, which can easily happen with your patch, since anything can happen to the thread that was supposed to send the SIGUSR1.
anything? there is a very small chance that _another_ thread will suspend or kill the responsible thread before it can send the signal. code that does that won't work reliably anyway, since the second suspend could just have easily have happened before the first.
this is getting confusing. maybe it's time to start drawing ascii art ;-)
can you honestly come up with a case where this change is going to affect anyone?? the only code I can think of that might behave differently is if someone is trying to do thread synchronization using SuspendThread - which Microsoft explicitly say won't work...
if you are really worried about the above case, we could block SIGUSR1/SIGKILL for the duration of the suspendthread client/server interaction - we already block these signals during 'wine_server_call()' - it's just a case of extending this blocking around the other code in SuspendThread().
Seeya, Adam -- Real Programmers don't comment their code. If it was hard to write, it should be hard to read, and even harder to modify. These are all my own opinions.
Adam Gundy arg@cyberscience.com writes:
it's not guaranteed to be immediate on UP either - harder to trigger, but suppose other signals are pending for the thread to be suspended; they could be handled first...
since we block SIGUSR1 delivery during client/server interaction , the thread to be suspended could wait a (potentially) long time before it is suspended.
It isn't executing any application code, that is what is important. From the point of view of the app a signal handler or a server call is just like a system call, it isn't running actual code.
anything? there is a very small chance that _another_ thread will suspend or kill the responsible thread before it can send the signal. code that does that won't work reliably anyway, since the second suspend could just have easily have happened before the first.
But that's not a problem, because the thread won't have been suspended at all. The problem is not that the app is broken, it may well be but there's nothing we can do about it. The problem is that the server state must be consistent with what is actually happening in the client.
can you honestly come up with a case where this change is going to affect anyone?? the only code I can think of that might behave differently is if someone is trying to do thread synchronization using SuspendThread - which Microsoft explicitly say won't work...
What Microsoft say doesn't matter; what matters is what applications do. Many apps use SuspendThread, and it has to work reliably 100% of the time, not just 99%. It's the same argument we had with mutexes, and shared memory, etc. Sure you can simplify a lot of things if you don't try to handle "unlikely" cases; but practice has shown that no matter how unlikely a case may seem, there is a Windows app out there that triggers it. And we want to run *all* Windows apps, not just the ones that are lucky enough to not trigger our bugs.
if you are really worried about the above case, we could block SIGUSR1/SIGKILL for the duration of the suspendthread client/server interaction - we already block these signals during 'wine_server_call()' - it's just a case of extending this blocking around the other code in SuspendThread().
You can't block SIGKILL, or SIGSTOP, or seg faults, or whatever else could happen in there. The design principle of the wineserver is that it should be reliable no matter how broken the client is.
At 09:50 04/04/03 -0800, Alexandre Julliard wrote:
Adam Gundy arg@cyberscience.com writes:
it's not guaranteed to be immediate on UP either - harder to trigger, but suppose other signals are pending for the thread to be suspended; they could be handled first...
since we block SIGUSR1 delivery during client/server interaction , the thread to be suspended could wait a (potentially) long time before it is suspended.
It isn't executing any application code, that is what is important. From the point of view of the app a signal handler or a server call is just like a system call, it isn't running actual code.
you're ignoring SMP again...
also, some signal handlers are executing application code I think (SEGV for example - yes I know this is rare).
anything? there is a very small chance that _another_ thread will suspend or kill the responsible thread before it can send the signal. code that does that won't work reliably anyway, since the second suspend could just have easily have happened before the first.
But that's not a problem, because the thread won't have been suspended at all. The problem is not that the app is broken, it may well be but there's nothing we can do about it. The problem is that the server state must be consistent with what is actually happening in the client.
can we ever ask the server whether a thread is suspended?
can you honestly come up with a case where this change is going to affect anyone?? the only code I can think of that might behave differently is if someone is trying to do thread synchronization using SuspendThread - which Microsoft explicitly say won't work...
What Microsoft say doesn't matter; what matters is what applications do. Many apps use SuspendThread, and it has to work reliably 100% of the time, not just 99%. It's the same argument we had with mutexes, and shared memory, etc. Sure you can simplify a lot of things if you don't try to handle "unlikely" cases; but practice has shown that no matter how unlikely a case may seem, there is a Windows app out there that triggers it. And we want to run *all* Windows apps, not just the ones that are lucky enough to not trigger our bugs.
if you are really worried about the above case, we could block SIGUSR1/SIGKILL for the duration of the suspendthread client/server interaction - we already block these signals during 'wine_server_call()' - it's just a case of extending this blocking around the other code in SuspendThread().
You can't block SIGKILL, or SIGSTOP, or seg faults, or whatever else could happen in there. The design principle of the wineserver is that it should be reliable no matter how broken the client is.
sorry, I meant SIGTERM (which is what we send for KillThread) not SIGKILL. we don't send SIGSTOP anymore so that's not a problem.
there is currently a single chance for a SEGV to occur in the 5/6 lines of extra code we would protect - some re-ordering of the code would eliminate that chance.
regardless, I have just posted a reworked patch to wine-patches with the title "PATCH: client side SuspendThread() handling when using valgrind" which adds a flag to the suspend_thread and init_thread _requests_ indicating that the client would like to do the suspend locally. the client code has been modified to only ask for this if it is actually being run under valgrind. hopefully this can be committed to CVS..
Seeya, Adam -- Real Programmers don't comment their code. If it was hard to write, it should be hard to read, and even harder to modify. These are all my own opinions.