New subject: Fast thread-local storage for OpenGL drivers

22 Feb 2003

      Daniel Jacobowitz wrote:
...
Note the "always" in Roland's paragraph.
Note the fact that he said it would require one of the dynamic access models
(GD or LD), which require at least one function call to access thread local
variables.  As I've said, this is an unacceptable hit on performance.
...
When you say two or three, are these two or three pointers or two or
three large tables?
Two or three pointers.  I'm pretty sure we use less than 8 pointers all up,
although many of those aren't performance critical.  Three of ours most
definitely are, and it would be nice if moving to a couple more didn't break
things.  We only ever use thread-local pointers, never whole structs or
anything like that.
...
In any case, it sounds like you could:

select the thread-local variables that you need fast access to
Arrange for those variables to be tagged with an
__attribute__((tls_model("initial-exec"))), or something similar.
Make sure the TLS_STATIC_SURPLUS is big enough to hold them.

Will this be okay, considering that two shared libraries will need access to
the variables (libGL.so itself and the driver backend)?  Can you use IE or
LE with variables that live in another shared library?
...
I don't see a problem, but you'd have to do some serious reading of the
TLS ABI documents.... they're quite thorough.
Sure, the code itself isn't hard to understand.  The problem is, at runtime,
how do I know what code to generate to access a given __thread variable?  Do
I have do disassemble a function that accesses the variable to know the
right model to use?  Fixed offsets make this trivial, but maybe this isn't a
real problem after all.
--
Gareth Hughes (gareth@nvidia.com)
OpenGL Developer, NVIDIA Corporation

RE: Fast thread-local storage for OpenGL drivers