[Bug 50292] New: Process-local synchronization objects use private interfaces into the Unix library

List overview All Threads

newer

older

[Bug 7553] Conquer/Conquista :...

[Bug 51916] New: Wine loses...

WineHQ Bugzilla

9 Dec 2020 9 Dec '20

2:33 a.m.

https://bugs.winehq.org/show_bug.cgi?id=50292

Bug ID: 50292 Summary: Process-local synchronization objects use private interfaces into the Unix library Product: Wine Version: 6.0-rc1 Hardware: x86-64 OS: Linux Status: NEW Severity: normal Priority: P2 Component: ntdll Assignee: wine-bugs@winehq.org Reporter: z.figura12@gmail.com Distribution: ---

This is not exactly a bug, but it is something we'd like to change about the ntdll code, and we need somewhere to track https://github.com/wine-staging/wine-staging/tree/master/patches/ntdll-NtAlertThreadByThreadId. The basic "problem" is that process-local synchronization APIs [condition variables, SRW locks, critical sections, Win32 futexes] use unixlib vectors [fast_RtlpWaitForCriticalSection() et al.] instead of syscalls.

Some testing suggests that Win32 futexes and condition variables are implemented on top of an internal, undocumented interface using the functions NtAlertThreadByThreadId() and NtWaitForAlertByThreadId(), which are themselves presumably syscalls. In particular, a thread waiting inside RtlSleepConditionVariable*() or RtlWaitOnAddress() can be awoken with NtAlertThreadByThreadId(). Because this interface is meant specifically for process-local synchronization (in particular, it is not possible to alert another process's thread), it is a good fit for Wine, both for those objects as well as SRW locks and critical sections. Accordingly I have created a patch set which implements these Nt* interfaces, and reimplements the Rtl* APIs on top of them.

There is a problem with these patches, however. Some performance testing done by Etienne Juvigny reveals that Star Citizen, a game which makes heavy use of Win32 futexes, suffers a drop in performance from these patches. On one machine, configured to be maximally CPU-limited, performance drops from 92 to 80 FPS. There are two potential reasons for this:

(1) locking of TEB lists is slow, both on the PE-side and the Unix-side. This is especially likely given that performance suffered massively before the locks were converted to read-write locks. However, removing these locks entirely is difficult if not impossible.

(2) RtlWakeAddressAll() performs multiple syscalls (one per thread waiting on the given address). In Star Citizen, this can be up to six threads in normal usage patterns. This seems to be true on Windows as well: NtAlertThreadByThreadId() only seems capable of waking one thread at a time, and I can find no similar API that wakes more than one thread.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

Show replies by date

WineHQ Bugzilla

9 Dec 9 Dec

3:18 a.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

Zebediah Figura z.figura12@gmail.com changed:

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

3 Jan 3 Jan

6:38 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

winetaste@gmx.net changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |winetaste@gmx.net

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

11:11 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

mirh mirh@protonmail.ch changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |mirh@protonmail.ch

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

5 Jan 5 Jan

8:20 a.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

Alexis Peypelut iroalexis@outlook.fr changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |iroalexis@outlook.fr

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

6:53 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

Kim Malmo berencamlost@msn.com changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |berencamlost@msn.com

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

9 Jan 9 Jan

10:37 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

braiamp@gmail.com changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |braiamp@gmail.com

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

10 Jan 10 Jan

12:38 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

Rémi Bernon rbernon@codeweavers.com changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |rbernon@codeweavers.com

--- Comment #1 from Rémi Bernon rbernon@codeweavers.com --- Hi, I had a quick look at the patch series and here's a few nitpicks that I've found:

...

In 0004-ntdll-Implement-NtAlertThreadByThreadId-and-NtWaitFo.patch:

   if (teb->ClientId.UniqueThread == tid)

```
   {
```

       pthread_rwlock_unlock( &teb_list_lock );

       NtSetEvent( thread_data->tid_alert_event, NULL );

```
       return STATUS_SUCCESS;
```
```
   }
```

I think there's a race condition here, were the thread could potentially be interrupted after the TEB lock is released, but before the event is set.

The other thread that thread_data refers to may then terminate, the NtSetEvent call may set an non-existing event, or worse if the TEB is reused, and the new thread waiting itself, wake a wrong thread.

It's probably unlikely to happen but from a correctness point of view I think it's wrong.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

12:42 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #2 from Rémi Bernon rbernon@codeweavers.com ---

...

In 0006-ntdll-Implement-thread-id-alerts-on-top-of-futexes-i.patch

   while (!InterlockedExchange( futex, 0 ))

```
   {
```
```
       if (timeout)
```
```
       {
```

           LONGLONG timeleft = update_timeout( end );

```
           struct timespec timespec;
```

           timespec.tv_sec = timeleft / (ULONGLONG)TICKSPERSEC;

           timespec.tv_nsec = (timeleft % TICKSPERSEC) * 100;

           ret = futex_wait( futex, 0, &timespec );

```
       }
```
```
       else
```

           ret = futex_wait( futex, 0, NULL );

       if (ret == -1 && errno == ETIMEDOUT) return STATUS_TIMEOUT;

```
   }
```

If FUTEX_WAIT_BITSET is available, I think you can use it to wait with an absolute timeout, which could save calls to NtQuerySystemTime.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

1:02 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #3 from Rémi Bernon rbernon@codeweavers.com --- (In reply to Rémi Bernon from comment #2)

...

If FUTEX_WAIT_BITSET is available, I think you can use it to wait with an absolute timeout, which could save calls to NtQuerySystemTime.

Or maybe it's on purpose because of CLOCK_MONOTONIC NTP adjustments? In which case please just ignore my comment.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

4:45 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #4 from Zebediah Figura z.figura12@gmail.com --- (In reply to Rémi Bernon from comment #1)

...

Hi, I had a quick look at the patch series and here's a few nitpicks that I've found:

...
In 0004-ntdll-Implement-NtAlertThreadByThreadId-and-NtWaitFo.patch:
   if (teb->ClientId.UniqueThread == tid)
   {
       pthread_rwlock_unlock( &teb_list_lock );
       NtSetEvent( thread_data->tid_alert_event, NULL );
       return STATUS_SUCCESS;
   }
I think there's a race condition here, were the thread could potentially be interrupted after the TEB lock is released, but before the event is set.

The other thread that thread_data refers to may then terminate, the NtSetEvent call may set an non-existing event, or worse if the TEB is reused, and the new thread waiting itself, wake a wrong thread.

It's probably unlikely to happen but from a correctness point of view I think it's wrong.

I think you're right, this is a race and there's not much of a penalty to fixing it.

(In reply to Rémi Bernon from comment #3)

...

(In reply to Rémi Bernon from comment #2)

...
If FUTEX_WAIT_BITSET is available, I think you can use it to wait with an absolute timeout, which could save calls to NtQuerySystemTime.

Or maybe it's on purpose because of CLOCK_MONOTONIC NTP adjustments? In which case please just ignore my comment.

Probably, yes, we'd need a RAW flag. Note though that the timeout is always relative when called through kernel32, so it doesn't help us much anyway.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

11 Jan 11 Jan

10:41 a.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #5 from Rémi Bernon rbernon@codeweavers.com --- (In reply to Zebediah Figura from comment #4)

...

(In reply to Rémi Bernon from comment #3)

...
(In reply to Rémi Bernon from comment #2)

...
If FUTEX_WAIT_BITSET is available, I think you can use it to wait with an absolute timeout, which could save calls to NtQuerySystemTime.

Or maybe it's on purpose because of CLOCK_MONOTONIC NTP adjustments? In which case please just ignore my comment.

Probably, yes, we'd need a RAW flag.

Actually, calling NtQuerySystemTime also suffers from the same issue, as from what I can see, it uses CLOCK_MONOTONIC.

...

Note though that the timeout is always relative when called through kernel32, so it doesn't help us much anyway.

Yes sure, and we always need to convert it to absolute first, but then we could call futex_wait with the absolute timeout directly and drop the update_timeout thing, making the code simpler.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

12:53 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

Paul Gofman pgofman@codeweavers.com changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |pgofman@codeweavers.com

--- Comment #6 from Paul Gofman pgofman@codeweavers.com --- (In reply to Zebediah Figura from comment #4)

...

Probably, yes, we'd need a RAW flag. Note though that the timeout is always relative when called through kernel32, so it doesn't help us much anyway.

Please note that using CLOCK_MONOTONIC_RAW instead of CLOCK_MONOTONIC may end up in a syscall instead of VDSO call, so it will have performance impact.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

1:10 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #7 from Rémi Bernon rbernon@codeweavers.com --- Regarding the performance issue, I think the linked-list nature of the TEB list could also be part of issue, especially when a large number of threads are created. On synthetic benchmarks, it's very clear that it induces a lot of CPU cache misses.

IIUC it's not really possible to avoid a global lock here. As NtAlertThreadByThreadId function is supposed to return a specific status if the targeted TID does not exist, I guess there's no other way than tracking all threads.

However, it should be possible to make it faster by having a static TID + sync object array instead, to help the CPU speculative execution do its magic. Of course it will then have to support a dynamic number of threads, but perhaps a reasonably large fixed maximum number may be acceptable and could still perform better than the linked list?

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

5:33 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #8 from Zebediah Figura z.figura12@gmail.com --- (In reply to Rémi Bernon from comment #7)

...

Regarding the performance issue, I think the linked-list nature of the TEB list could also be part of issue, especially when a large number of threads are created. On synthetic benchmarks, it's very clear that it induces a lot of CPU cache misses.

IIUC it's not really possible to avoid a global lock here. As NtAlertThreadByThreadId function is supposed to return a specific status if the targeted TID does not exist, I guess there's no other way than tracking all threads.

However, it should be possible to make it faster by having a static TID + sync object array instead, to help the CPU speculative execution do its magic. Of course it will then have to support a dynamic number of threads, but perhaps a reasonably large fixed maximum number may be acceptable and could still perform better than the linked list?

It should be possible to do it lock-free with the same mechanism as we use on the PE side, i.e. a growable linked list of arrays. I hadn't though of that when I first wrote the patch. It was my vague plan to try that optimization and see if it helps.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

28 Jan 28 Jan

11:20 a.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

Jacek Caban jacek@codeweavers.com changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |jacek@codeweavers.com

--- Comment #9 from Jacek Caban jacek@codeweavers.com --- I took a look at the series and have some suggestions. It would be good to make NtAlertThreadByThreadId lock-free, but I'd suggest to try to do that in a spirit similar to fd cache. Looking at pid/tid allocation strattegy in server, we'd rarely need more than one cache array allocation for entire process.

It should be possible to do tid->TEB mapping this way. Another possible way would be to simply map to a thread-specific int value. That value could be then used directly as a futex or HANDLE storage (and maybe some special value to be able to distinguish between non-waiting and non-existing tid, if needed).

For PE side of things, I wonder if you really need so complicated wait entry allocation. Could we just store wait entry on waiting thread's stack?

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

5:03 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #10 from Zebediah Figura z.figura12@gmail.com --- (In reply to Jacek Caban from comment #9)

...

I took a look at the series and have some suggestions. It would be good to make NtAlertThreadByThreadId lock-free, but I'd suggest to try to do that in a spirit similar to fd cache. Looking at pid/tid allocation strattegy in server, we'd rarely need more than one cache array allocation for entire process.

The only problem with that is that if we run out of space in the fd cache, the only effect is we have to go to the server each time. If we run out of space in the TID/addr or TID/TEB mappings, the functions can't work. Maybe that's not worth worrying about, though...

...

It should be possible to do tid->TEB mapping this way. Another possible way would be to simply map to a thread-specific int value. That value could be then used directly as a futex or HANDLE storage (and maybe some special value to be able to distinguish between non-waiting and non-existing tid, if needed).

You mean, map to the value directly rather than mapping to the TEB and looking up the value from there? Sure, and if we make it lock-free that definitely seems like a good idea.

...

For PE side of things, I wonder if you really need so complicated wait entry allocation. Could we just store wait entry on waiting thread's stack?

We need some way to append and arbitrarily remove it from a list of waiting threads. It would have been better than the original approach (i.e. iterating over TEBs), but it would still require locking (afaik only singly linked lists can be lock-free, and you can't pop from the middle of one).

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

6:05 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #11 from Jacek Caban jacek@codeweavers.com --- (In reply to Zebediah Figura from comment #10)

...

The only problem with that is that if we run out of space in the fd cache, the only effect is we have to go to the server each time. If we run out of space in the TID/addr or TID/TEB mappings, the functions can't work. Maybe that's not worth worrying about, though...

We can ensure that the cache slot is allocated somewhere in NtCreateThreadEx, where we can properly propagate error.

...

You mean, map to the value directly rather than mapping to the TEB and looking up the value from there?

Yes.

...

...
For PE side of things, I wonder if you really need so complicated wait entry allocation. Could we just store wait entry on waiting thread's stack?

We need some way to append and arbitrarily remove it from a list of waiting threads. It would have been better than the original approach (i.e. iterating over TEBs), but it would still require locking (afaik only singly linked lists can be lock-free, and you can't pop from the middle of one).

Reading it again in more details, it's probably better to stick with your current solution for PE part.

FWIW, we wouldn't need to pop from a middle if we had a separated lists for each address. The problem would be lock-free mapping from input address to such a single listed list. This could be done, but 64-bit key size makes it much less appealing (than, say, 30-bit key for fd cache).

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

6:12 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #12 from Zebediah Figura z.figura12@gmail.com --- (In reply to Jacek Caban from comment #11)

...

We can ensure that the cache slot is allocated somewhere in NtCreateThreadEx, where we can properly propagate error.

I guess putting a hard limit on the number of threads works too...

...

FWIW, we wouldn't need to pop from a middle if we had a separated lists for each address. The problem would be lock-free mapping from input address to such a single listed list. This could be done, but 64-bit key size makes it much less appealing (than, say, 30-bit key for fd cache).

Applying a hash map might help in general. I'm not sure how even separate lists for each address helps us, though. The waits have an arbitrary timeout, so it'd still be possible to pop from the middle.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

6:33 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #13 from Jacek Caban jacek@codeweavers.com --- (In reply to Zebediah Figura from comment #12)

...

(In reply to Jacek Caban from comment #11)

...
We can ensure that the cache slot is allocated somewhere in NtCreateThreadEx, where we can properly propagate error.

I guess putting a hard limit on the number of threads works too...

I'm not sure how that would help, I guess that we're thinking about different implementation details. There is a global limit of threads implied by the fact that thread ids are 30 bits (a DWORD value with two least significant bit ignored). We could have tid_cache similar to fd_cache:

- in NtCreateThreadEx: ensure that tid_cache[tid >> tid_cache_shift] is allocated and mark tid_cache[tid >> tid_cache_shift][tid & tid_cache_mask] as valid

- in NtWaitForAlertByThreadId and NtAlertThreadByThreadId: use only tid_cache for thread-specific data

- when terminating thread: mark appropriate tid_cache as invalid

...

...
FWIW, we wouldn't need to pop from a middle if we had a separated lists for each address. The problem would be lock-free mapping from input address to such a single listed list. This could be done, but 64-bit key size makes it much less appealing (than, say, 30-bit key for fd cache).

Applying a hash map might help in general. I'm not sure how even separate lists for each address helps us, though. The waits have an arbitrary timeout, so it'd still be possible to pop from the middle.

Oh right, I forgot about timeouts.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

4 Feb 4 Feb

2:49 a.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #14 from Zebediah Figura z.figura12@gmail.com --- (In reply to Jacek Caban from comment #13)

...

I'm not sure how that would help, I guess that we're thinking about different implementation details. There is a global limit of threads implied by the fact that thread ids are 30 bits (a DWORD value with two least significant bit ignored). We could have tid_cache similar to fd_cache:

in NtCreateThreadEx: ensure that tid_cache[tid >> tid_cache_shift] is

allocated and mark tid_cache[tid >> tid_cache_shift][tid & tid_cache_mask] as valid

in NtWaitForAlertByThreadId and NtAlertThreadByThreadId: use only

tid_cache for thread-specific data

when terminating thread: mark appropriate tid_cache as invalid

Sure. On the other hand, even for a two-tiered array like that 2**30 (times 8 or 16 bytes) is arguably a lot of storage to reserve. I don't really know what's reasonable, though.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

3:48 a.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #15 from Zebediah Figura z.figura12@gmail.com --- Wait, maybe I'm stupid: were you proposing a tid -> alert primitive cache *in addition to* the TEB list that already exists? I was assuming it to be a replacement. (Honestly, if it's reasonable to replace it entirely, and just fail NtCreateThreadEx() if we run out of space, I'd rather do that, but...)

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

8 Feb 8 Feb

2:10 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #16 from Jacek Caban jacek@codeweavers.com --- I meant it as a replacement, not an addition. I may have been an overly optimistic about just 2 dim array, 3 dim could be less wasteful (although there may be a better structure). The point is to make a single tid slot cheap enough (4 bytes seems more than enough, but it could be even larger if needed) so that we may over-allocate and never free. Over-allocation makes allocation of a single slot cheap (because it will usually be already allocated) and never freeing avoids any need for locking.

It may seem like a wasteful strategy, but note that server gives us a good locality of tid values and reuses previously freed ids. In practice, I'd expect that a single block of a few thousands entries should be enough for majority of applications. If we optimize for that (while being able to handle even a theoretical case of 2**30 created threads with enough allocations), I'd hope that we can get decent results. I can't be sure without trying, maybe I'm missing something.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

4:51 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

--- Comment #17 from Zebediah Figura z.figura12@gmail.com --- (In reply to Jacek Caban from comment #16)

...

I meant it as a replacement, not an addition. I may have been an overly optimistic about just 2 dim array, 3 dim could be less wasteful (although there may be a better structure). The point is to make a single tid slot cheap enough (4 bytes seems more than enough, but it could be even larger if needed) so that we may over-allocate and never free. Over-allocation makes allocation of a single slot cheap (because it will usually be already allocated) and never freeing avoids any need for locking.

It may seem like a wasteful strategy, but note that server gives us a good locality of tid values and reuses previously freed ids. In practice, I'd expect that a single block of a few thousands entries should be enough for majority of applications. If we optimize for that (while being able to handle even a theoretical case of 2**30 created threads with enough allocations), I'd hope that we can get decent results. I can't be sure without trying, maybe I'm missing something.

Okay, that's good, that sounds like the ideal strategy to me too. I just wasn't sure if there was a reasonable tradeoff between maximum number of threads possible + amount of memory unnecessarily reserved + lookup time. (I am less than completely enthused about the idea of a 3-dimensional list—I think we should probably be optimizing for lookup time above all else.)

I think we do need 8 bytes, at least for 64-bit, or perhaps just for 64-bit Mac. A futex is 4, but on Mac we use semaphore_t, which is a pointer (though I haven't looked to see if there's a better solution), and if neither is available we use an event (we could possibly use a keyed event, though we'd have to introduce some locking—but we're already on the slow path).

Thinking out loud here: a two-dimensional array of 65536 bytes * 4096 entries would give us 2**26 threads on 32-bit, 2**25 on 64-bit, while using 16+(4 or 8) pages in the common case. 32-bit has a much lower limit imposed by the stack size, so that's not a concern, but is that enough threads for 64-bit? Raising that to 2**30 for 64-bit would be a reservation of 65536 entries = 128 pages, which is kind of a lot, though in practice only the first is ever faulted in and we have VA space to spare.

Does that sound like a reasonable compromise? I think we could even shrink the 32-bit reservations so to use less address space.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

18 Nov 18 Nov

10:26 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

Zebediah Figura z.figura12@gmail.com changed:

--- Comment #18 from Zebediah Figura z.figura12@gmail.com --- Fixed upstream as of https://source.winehq.org/git/wine.git/commitdiff/f3e9498740d675b74d0f84b8120ef7e1e2122ec1.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

WineHQ Bugzilla

19 Nov 19 Nov

9:46 p.m.

New subject: [Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

https://bugs.winehq.org/show_bug.cgi?id=50292

Alexandre Julliard julliard@winehq.org changed:

What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED

--- Comment #19 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 6.22.

-- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.

1333

Age (days ago)

1678

Last active (days ago)

wine-bugs@winehq.org

25 comments

1 participants

tags (0)

participants (1)

WineHQ Bugzilla