Daniel Jacobowitz wrote:
Note the "always" in Roland's paragraph.
Note the fact that he said it would require one of the dynamic access models (GD or LD), which require at least one function call to access thread local variables. As I've said, this is an unacceptable hit on performance.
When you say two or three, are these two or three pointers or two or three large tables?
Two or three pointers. I'm pretty sure we use less than 8 pointers all up, although many of those aren't performance critical. Three of ours most definitely are, and it would be nice if moving to a couple more didn't break things. We only ever use thread-local pointers, never whole structs or anything like that.
In any case, it sounds like you could:
- select the thread-local variables that you need fast access to
- Arrange for those variables to be tagged with an __attribute__((tls_model("initial-exec"))), or something similar.
- Make sure the TLS_STATIC_SURPLUS is big enough to hold them.
Will this be okay, considering that two shared libraries will need access to the variables (libGL.so itself and the driver backend)? Can you use IE or LE with variables that live in another shared library?
I don't see a problem, but you'd have to do some serious reading of the TLS ABI documents.... they're quite thorough.
Sure, the code itself isn't hard to understand. The problem is, at runtime, how do I know what code to generate to access a given __thread variable? Do I have do disassemble a function that accesses the variable to know the right model to use? Fixed offsets make this trivial, but maybe this isn't a real problem after all.
-- Gareth Hughes (gareth@nvidia.com) OpenGL Developer, NVIDIA Corporation