This patch set fixes most performance problems caused by ReleaseSemaphore() and WaitForSingle/MultipleObject(s) making server calls. One victim of this is Star Wars Battlefront (https://bugs.winehq.org/show_bug.cgi?id=29582), where the majority of the CPU time is spent on context switching while the program spams ReleaseSemaphore, WaitForSingle/MultipleObject(s) and GetForgroundWindow, the later of which I made a work-around hack for and that some players have been using.
This patch set doubles performance of bug #29582. (When combined with GetForgroundWindow hack the problem is completely resolved.) The patch set works by having the server create a POSIX semaphore object and sharing the key to that object with the client process, enabling the client process to be able to implement ReleaseSemaphore and optimistic-case wait calls (where no blocking is reburied) without a server call. Blocking waits and any wait-multiple that cannot be resolved in the client process (e.g., bWaitAll=TRUE and objects include non-semaphores) is still handled by the server. (Implementing blocking wait calls on the client can yield some performance improvements because a context switch to another thread in the same program won't require swapping out the memory map & such, but I would expect this to be less significant.)
However, upon further experimentation, I discovered that POSIX semaphores in glibc are actually implemented using a shared memory page, which may not be acceptable since a bad process can corrupt that page and potentially cause sem_* function calls in the server to fail as well as other client programs fail and/or deadlock. I am working on a System V adaptation, but I thought it would be a good idea to see feedback & comments now.
Another problem is that this causes the threadpool test fails at line 1299, where the previous "release all semaphores and wait for callback" test is done in reverse order. I presume this is due to the nature of the linux scheduler being inconsistent with how Windows *happens* to schedule its threads. I have an idea for a fix for this already, but I will still have to dig deeper into it.
The code is still in experimental quality (assert(0)s and such) and I've already re-worked the configure.ac stuff, I'm mostly concerned with feedback on the general scheme.
Thanks! Daniel