[Bug 50292] Process-local synchronization objects use private interfaces into the Unix library

8 Feb 2021


      https://bugs.winehq.org/show_bug.cgi?id=50292
--- Comment #17 from Zebediah Figura z.figura12@gmail.com ---
(In reply to Jacek Caban from comment #16)
...
I meant it as a replacement, not an addition. I may have been an overly
optimistic about just 2 dim array, 3 dim could be less wasteful (although
there may be a better structure). The point is to make a single tid slot
cheap enough (4 bytes seems more than enough, but it could be even larger if
needed) so that we may over-allocate and never free. Over-allocation makes
allocation of a single slot cheap (because it will usually be already
allocated) and never freeing avoids any need for locking.
It may seem like a wasteful strategy, but note that server gives us a good
locality of tid values and reuses previously freed ids. In practice, I'd
expect that a single block of a few thousands entries should be enough for
majority of applications. If we optimize for that (while being able to
handle even a theoretical case of 2**30 created threads with enough
allocations), I'd hope that we can get decent results. I can't be sure
without trying, maybe I'm missing something.
Okay, that's good, that sounds like the ideal strategy to me too. I just wasn't
sure if there was a reasonable tradeoff between maximum number of threads
possible + amount of memory unnecessarily reserved + lookup time. (I am less
than completely enthused about the idea of a 3-dimensional list—I think we
should probably be optimizing for lookup time above all else.)
I think we do need 8 bytes, at least for 64-bit, or perhaps just for 64-bit
Mac. A futex is 4, but on Mac we use semaphore_t, which is a pointer (though I
haven't looked to see if there's a better solution), and if neither is
available we use an event (we could possibly use a keyed event, though we'd
have to introduce some locking—but we're already on the slow path).
Thinking out loud here: a two-dimensional array of 65536 bytes * 4096 entries
would give us 2**26 threads on 32-bit, 2**25 on 64-bit, while using 16+(4 or 8)
pages in the common case. 32-bit has a much lower limit imposed by the stack
size, so that's not a concern, but is that enough threads for 64-bit? Raising
that to 2**30 for 64-bit would be a reservation of 65536 entries = 128 pages,
which is kind of a lot, though in practice only the first is ever faulted in
and we have VA space to spare.
Does that sound like a reasonable compromise? I think we could even shrink the
32-bit reservations so to use less address space.
-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[Bug 50292] Process-local synchronization objects use private interfaces into the Unix library