Roland McGrath wrote:
In glibc, we actually allocate some excess space in the thread-local storage area layout determined at startup time. This lets a dynamically loaded module use static TLS if its PT_TLS segment fits in the available surplus. (In sysdeps/generic/dl-tls.c, see TLS_STATIC_SURPLUS.) If there is insufficient space preallocated, then loading the module will fail. In fact, we put this feature there with GL in mind and can adjust the preallocated surplus for what is most useful in practice.
The last time we discussed this issue, I had the distinct impression that an OpenGL library would essentially be forced into using one of the dynamic access models (GD or LD) for __thread variables, hence requiring at least one function call to access a thread-local variable. I also had the distinct impression that the glibc maintainers were unwilling to modify their implementation so that we could use the LE access model, which would allow a 2 instruction thread-safe dispatcher among other things.
It looks like I was wrong, and you've gone and addressed all the concerns I originally had with __thread variables. For that, I'm grateful.
In fact, we put this feature there with GL in mind...
Did you inform the OpenGL vendors who were interested in this issue of this fact? Have you documented it anywhere, particularly in Ulrich Drepper's "ELF Handling For Thread-Local Storage" document? The current version of this document clearly states that the Local Exec TLS model "can only be used for code in the executable itself and to access variables in the executable itself". Perhaps you can see why I was still under the impression that it would not work for a dynamically loadable shared library.