 
            On Apr 5, 2014, at 5:25 PM, Daniel Horn wrote:
The call to server_enter_uninterrupted_section in server_get_unix_fd was causing 2 million syscalls over the span of several seconds when launching many popular application suites and it was causing significant performance overhead.
Switching those two lines resolves the problem entirely because the cache is sized such that it nearly never misses.
You can't call get_cached_fd() without entering the critical section because the critical section is protecting the fd cache. Your patch opens the code up to race conditions and data inconsistency.
-Ken
 
            What about using a lighter weight locking primitive such as one that does not require a syscall?- Daniel
On Sat, Apr 5, 2014 at 5:24 PM, Ken Thomases ken@codeweavers.com wrote:
On Apr 5, 2014, at 5:25 PM, Daniel Horn wrote:
The call to server_enter_uninterrupted_section in server_get_unix_fd was causing 2 million syscalls over the span of several seconds when launching many popular application suites and it was causing significant performance overhead.
Switching those two lines resolves the problem entirely because the cache is sized such that it nearly never misses.
You can't call get_cached_fd() without entering the critical section because the critical section is protecting the fd cache. Your patch opens the code up to race conditions and data inconsistency. -Ken
 
            On Apr 5, 2014, at 7:36 PM, Daniel Horn wrote:
What about using a lighter weight locking primitive such as one that does not require a syscall?
When it's not under contention, the critical section implementation ought to avoid syscalls already. When it is under contention, it's pretty unavoidable that syscalls will be necessary. (A critical section can use a spinlock to put that off for a bit if the thread that owns the lock will be done with it very quickly. That might help in the common case where the fd is found in the cache _if_ the problem is contention over the critical section. That's a big "if".)
However, I think the real issue is the calls to pthread_sigmask() in server_{enter,leave}_uninterrupted_section(). I think those are necessary to prevent the wineserver from suspending the thread while it holds the critical section and thus deadlocking with other threads. I don't know if anything can be done about that.
It might be possible to have a safe, lockless version of get_cached_fd() that sometimes produces a false-negative. Then, if it fails, you'd enter the uninterrupted section and try the reliable version before making the server call. Devising such a scheme and convincing us that it's safe and reliable is the hard part. It might be sufficient put a memory barrier before the lookup of the fd in get_cached_fd() and then something like this at the bottom of that block:
if (interlocked_cmpxchg( &fd_cache[entry][idx].fd, fd + 1, fd + 1) != fd + 1) fd = -1;
The idea is to verify that the fd has not been replaced or removed after we fetched it and its properties. (There's a chance that server_remove_fd_from_cache() is called after we make that check and then the caller of server_remove_fd_from_cache() closes the fd. And the fd could be reused for something else. However, as near as I can tell, that's already possible with server_get_unix_fd() as it stands.)
It's very possible that this scheme has a hole that I've missed. Getting this sort of thing right is notoriously difficult.
-Ken
 
            On Apr 5, 2014, at 8:37 PM, Ken Thomases wrote:
It might be possible to have a safe, lockless version of get_cached_fd() that sometimes produces a false-negative. Then, if it fails, you'd enter the uninterrupted section and try the reliable version before making the server call. Devising such a scheme and convincing us that it's safe and reliable is the hard part. It might be sufficient put a memory barrier before the lookup of the fd in get_cached_fd() and then something like this at the bottom of that block:
if (interlocked_cmpxchg( &fd_cache[entry][idx].fd, fd + 1, fd + 1) != fd + 1) fd = -1;
The idea is to verify that the fd has not been replaced or removed after we fetched it and its properties.
It occurred to me that, for this to work, add_fd_to_cache() has to be changed, too. I think the fd has to be temporarily cleared, then the properties have to be stored, and then the fd can be stored. A write memory barrier is required, but is supplied by interlocked_xchg() which is already used.
So, adding looks like:
clear fd store properties store fd
Getting looks like:
get fd get properties get fd again and compare
If a valid fd was read and it didn't change between the two points, then you can be sure the properties that you read go with the fd.
-Ken
 
            Thanks for thinking this through. An alternative is using a trylock and assuming a cache miss (or not adding to the cache) if trylock fails.
That would surely be simpler code. I will benchmark both variants vs the original and let you know my findings
- Daniel
On Sun, Apr 6, 2014 at 8:47 AM, Ken Thomases ken@codeweavers.com wrote:
On Apr 5, 2014, at 8:37 PM, Ken Thomases wrote:
It might be possible to have a safe, lockless version of get_cached_fd() that sometimes produces a false-negative. Then, if it fails, you'd enter the uninterrupted section and try the reliable version before making the server call. Devising such a scheme and convincing us that it's safe and reliable is the hard part. It might be sufficient put a memory barrier before the lookup of the fd in get_cached_fd() and then something like this at the bottom of that block:
if (interlocked_cmpxchg( &fd_cache[entry][idx].fd, fd + 1, fd + 1) != fd + 1) fd = -1;
The idea is to verify that the fd has not been replaced or removed after we fetched it and its properties.
It occurred to me that, for this to work, add_fd_to_cache() has to be changed, too. I think the fd has to be temporarily cleared, then the properties have to be stored, and then the fd can be stored. A write memory barrier is required, but is supplied by interlocked_xchg() which is already used. So, adding looks like: clear fd store properties store fd Getting looks like: get fd get properties get fd again and compare If a valid fd was read and it didn't change between the two points, then you can be sure the properties that you read go with the fd. -Ken
 
            On Apr 6, 2014, at 11:04 AM, Daniel Horn wrote:
Thanks for thinking this through. An alternative is using a trylock and assuming a cache miss (or not adding to the cache) if trylock fails.
I don't think that's safe with respect to the issue of being suspended and causing deadlock. The try may succeed, in which case you hold the critical section, in which case you should have set the sigmask beforehand, in which case you still incur the syscall expense.
(Although I'm still not confident I've completely understood the purpose of blocking signals for the duration of holding that critical section.)
-Ken

