Daniel Jacobowitz wrote:
Note the "always" in Roland's paragraph.
Note the fact that he said it would require one of the dynamic access models (GD or LD), which require at least one function call to access thread local variables. As I've said, this is an unacceptable hit on performance.
When you say two or three, are these two or three pointers or two or three large tables?
Two or three pointers. I'm pretty sure we use less than 8 pointers all up, although many of those aren't performance critical. Three of ours most definitely are, and it would be nice if moving to a couple more didn't break things. We only ever use thread-local pointers, never whole structs or anything like that.
In any case, it sounds like you could:
- select the thread-local variables that you need fast access to
- Arrange for those variables to be tagged with an __attribute__((tls_model("initial-exec"))), or something similar.
- Make sure the TLS_STATIC_SURPLUS is big enough to hold them.
Will this be okay, considering that two shared libraries will need access to the variables (libGL.so itself and the driver backend)? Can you use IE or LE with variables that live in another shared library?
I don't see a problem, but you'd have to do some serious reading of the TLS ABI documents.... they're quite thorough.
Sure, the code itself isn't hard to understand. The problem is, at runtime, how do I know what code to generate to access a given __thread variable? Do I have do disassemble a function that accesses the variable to know the right model to use? Fixed offsets make this trivial, but maybe this isn't a real problem after all.
-- Gareth Hughes (gareth@nvidia.com) OpenGL Developer, NVIDIA Corporation
On Sat, Feb 22, 2003 at 10:32:05AM -0800, Gareth Hughes wrote:
Two or three pointers. I'm pretty sure we use less than 8 pointers all up, although many of those aren't performance critical. Three of ours most definitely are, and it would be nice if moving to a couple more didn't break things. We only ever use thread-local pointers, never whole structs or anything like that.
What actually matters is the size of PT_TLS segment of the shared library which defines those 2-3 __thread variables (I assume it is libGL.so, right?). When at least one of STT_TLS symbols from some dlopened library is accessed using IE or LE model relocs, then this whole PT_TLS segment must be put into the surplus area. If possible, keep this at 2-4 pointers, so that other libraries can use it for their performance critical things too if needed. It would be good if the rest of __thread variables which aren't performance critical is provided by some other library (and accessed always through GD or LD model).
Will this be okay, considering that two shared libraries will need access to the variables (libGL.so itself and the driver backend)? Can you use IE or LE with variables that live in another shared library?
See above, it doesn't matter what all libraries use IE/LE relocs, it matters what shared library provides those TLS symbols those relocs resolve to.
Sure, the code itself isn't hard to understand. The problem is, at runtime, how do I know what code to generate to access a given __thread variable? Do I have do disassemble a function that accesses the variable to know the right model to use? Fixed offsets make this trivial, but maybe this isn't a real problem after all.
Forgot to say, the offsets are obviously constant (until you dlclose the library which declares them). If they weren't, one couldn't keep pointers to __thread variables around in IE/LE models.
Jakub