Roland McGrath wrote:
In glibc, we actually allocate some excess space in the thread-local storage area layout determined at startup time. This lets a dynamically loaded module use static TLS if its PT_TLS segment fits in the available surplus. (In sysdeps/generic/dl-tls.c, see TLS_STATIC_SURPLUS.) If there is insufficient space preallocated, then loading the module will fail. In fact, we put this feature there with GL in mind and can adjust the preallocated surplus for what is most useful in practice.
The last time we discussed this issue, I had the distinct impression that an OpenGL library would essentially be forced into using one of the dynamic access models (GD or LD) for __thread variables, hence requiring at least one function call to access a thread-local variable. I also had the distinct impression that the glibc maintainers were unwilling to modify their implementation so that we could use the LE access model, which would allow a 2 instruction thread-safe dispatcher among other things.
It looks like I was wrong, and you've gone and addressed all the concerns I originally had with __thread variables. For that, I'm grateful.
In fact, we put this feature there with GL in mind...
Did you inform the OpenGL vendors who were interested in this issue of this fact? Have you documented it anywhere, particularly in Ulrich Drepper's "ELF Handling For Thread-Local Storage" document? The current version of this document clearly states that the Local Exec TLS model "can only be used for code in the executable itself and to access variables in the executable itself". Perhaps you can see why I was still under the impression that it would not work for a dynamically loadable shared library.
On Sun, Feb 23, 2003 at 06:44:10PM -0800, Gareth Hughes wrote:
In fact, we put this feature there with GL in mind...
Did you inform the OpenGL vendors who were interested in this issue of this fact? Have you documented it anywhere, particularly in Ulrich Drepper's "ELF Handling For Thread-Local Storage" document? The current version of this document clearly states that the Local Exec TLS model "can only be used for code in the executable itself and to access variables in the executable itself". Perhaps you can see why I was still under the impression that it would not work for a dynamically loadable shared library.
I believe all this was said during the huge OpenGL thread in May 2002. Certainly the idea to support dlopening of limited number of IE/LE model using libs came at that time.
For the dispatch tables I even remember suggesting to: a) do the normal "awx" section entries with LE model, ie. .section openGL_wtext, "awx" .globl Foo Foo: movl %gs:__gl_dispatch@ntpoff, %eax jmpl *offset_Foo(%eax) b) in addition to that, you can build an .a library with the above 5 lines per .o file's source plus .hidden Foo which would make apps/libraries using openGL even faster (as they wouldn't hop through PLT, which is one memory load and indirect jump through the loaded value) at the expense of making offset_Foo part of the openGL ABI (which as far as I understood already is anyway because of the binary modules). c) or you could inline the calls
In the May thread, I'm pretty sure you mentioned __indirect* routines which are the biggest part of libGL.so are rarely used, which means the definitely should be compiled with -fpic, the rest if it is really performance critical can be put into awx sections using __attribute__((section("..."))).
Jakub