Alexandre Julliard wrote:
Imagine a process that does a SuspendThread() on one of its threads, but makes sure by some kind of internal protocol that the thread is never holding the mutex when it is suspended. This would work just fine under Windows or under the wineserver, but would fail with the shared memory approach.
Actually, if you use a per-client shared memory block, then the wineserver can execute ownership changes itself, without any cooperation from the client, whenever the client does not actually have the mutex locked. Your problem goes away.
There is simply no point in implementing a mutex that works only 99% of the time, no matter how fast it is. The synchronization objects have to be 100% reliable, performance only comes next; and none of the shared memory approaches I've seen so far are 100% reliable.
Of course it has to be 100% reliable, but since you're dealing with failing processes it's not 100% clear what 100% reliability is. Anyway, I think 100% reliability is possible for any plausible definition.
The following protocol looks good for the simpler one-thread-per-client-process case. It easily generalizes to the multiple threads per process case by treating each thread as a separate process, except then threads can stomp on their sibling threads' mutexes, but this is not new. (You could probably build an optimized version of the protocol for the multithread case, though.)
// indices identify the client. Client i can only access variables // indexed by i, the wineserver can access all bool client_locked[N]; bool client_is_owner[N]; bool notify_wineserver_on_release[N];
// client Acquire(i) { retry: atomically { if (client_is_owner[i] && !client_locked[i]) { client_locked[i] = true; return; } } if (!client_is_owner[i]) { wineserver->GiveThisProcessOwnership(i); // blocking goto retry; } else if (client_locked[i]) { // fail; deadlocked } }
Release(i) { int notify; atomically { notify = notify_wineserver_on_release[i]; client_locked = false; notify_wineserver_on_release[i] = false; } if (notify) { wineserver->NotifyUnlocked(); // nonblocking } }
// wineserver int true_owner; bool requested_ownership[N]; GiveThisProcessOwnership(i) { if (true_owner != i) { bool granted; atomically { // take lock from owner if we can if (!client_locked[true_owner]) { client_is_owner[true_owner] = false; granted = true; } else { notify_wineserver_on_release[i] = true; granted = false; } } if (granted) { // give lock to the requester true_owner = i; atomically { if (client_locked[true_owner]) { // process is corrupt, it can't have the lock already } else { client_is_owner[true_owner] = true; } } ReplyToProcess(i); // nonblocking } else { // let the requester wait requested_ownership[i] = true; } } }
NotifyUnlocked() { for (i = 0; i < N; i++) { if (requested_ownership[i]) break; } if (i == N) return; // no-one cares bool granted; atomically { // try to get lock from owner, as above if (!client_locked[true_owner]) { client_is_owner[true_owner] = false; granted = true; } else { notify_wineserver_on_release[true_owner] = true; granted = false; } } if (granted) { // give lock bool notify = false; true_owner = i; requested_ownership[i] = false; for (i = 0; i < N; i++) { if (requested_ownership[i]) notify = true; } atomically { if (client_locked[true_owner]) { // process is corrupt, it can't have the lock already } else { client_is_owner[true_owner] = true; notify_wineserver_on_release[true_owner] = notify; } } ReplyToProcess(i); // nonblocking } }
Pack the client_locked and client_is_owner variables into a single word per client. Implement "atomically" with atomic compare-and-swap.
There's no wineserver to client signalling (except for unblocking the client's blocking call). The wineserver never blocks. The wineserver never reads any values from client_is_owner or notify_wineserver_on_release. It only ever reads the client_locked value for the true_owner process. The only vulnerability that I can see is if a client process unlocks the mutex and then a unlucky stray write due to some bug subsequently writes 'true' to its client_locked field. Is that a significant increase in risk over the client simply hanging while it has the mutex locked?
NB there may well be bugs in the above code. This kind of code is hard to get right. (Someone who knows how to drive a model checking tool could verify it without much difficulty, though.)
If no-one needed fast mutexes it definitely wouldn't be worth doing. I'm advocating this just because a) someone does need them and b) I can't let the claim "it can't be done 100% safely this way" go unchallenged :-).
Rob