It is critically important for OpenGL drivers to have fast (single-instruction) access to thread local variables. I'd be happy to provide more information to anyone who's interested, but a typical case where TLS access can severely hurt performance is at the very front-end of an OpenGL library. Ideally, you'd like something like the following:
libGL.so: // This function loads a dispatch table pointer from // thread-local storage and jumps through to the // backend function (which typically resides in a // different shared library). glTexCoord2f: mov %fs:DISPATCH_TABLE_OFFSET, %eax jmp *__glapi_TexCoord2f(%eax) // Points to __my_TexCoord2f
libGLcore.so: // This function copies some data into the OpenGL // context, sets some magic flags to record what data // was copied, and returns. __my_TexCoord2f: mov %fs:CONTEXT_OFFSET, %eax // Copy 2 floats into the context // Set a flag ret
All in all, you have 2 TLS accesses in less than 10 instructions or so. Even if you don't understand exactly what's going on here, you can see that it is important to have fast access to thread-local data.
While glibc's new thread library implementation has many benefits, particularly to application programmers (with support for the new keyword '__thread', and so on), it basically forces a function call per thread local variable access for situations like the one I described above. This is clearly unacceptable for a high-performance OpenGL driver. Furthermore, the glibc developers have been completely unwilling to work with OpenGL driver developers (Open Source or otherwise) to provide a mechanism to access thread-local data in a way that meets our performance requirements.
Therefore, I'd like to propose a solution where Wine and the OpenGL driver cooperate to provide such a TLS access mechanism (at least on x86 platforms). Wine currently uses %fs to access the Windows Thread Environment Block (TEB), while glibc uses %gs to access its per-thread data. With the following patch to Wine's TEB structure:
--- include/thread.h 2002-12-17 16:06:25.000000000 -0500 +++ include/thread.h.new 2003-02-21 14:27:50.000000000 -0500 @@ -116,10 +116,12 @@ DWORD alarms; /* --3 22c Data for vm86 mode */ DWORD vm86_pending; /* --3 230 Data for vm86 mode */ void *vm86_ptr; /* --3 234 Data for vm86 mode */ + /* here is plenty space for wine specific fields (don't forget to change pad6!!) */ + DWORD pad6[608]; /* --n 238 */ + DWORD ogl_data[16]; /* --n bb8 OpenGL driver private data */
/* the following are nt specific fields */ - DWORD pad6[624]; /* --n 238 */ UNICODE_STRING StaticUnicodeString; /* -2- bf8 used by advapi32 */ USHORT StaticUnicodeBuffer[261]; /* -2- c00 used by advapi32 */ void *stack_base; /* -2- e0c Base of the stack */
we reserve %fs:0xbb8 to %fs:0xbf8 for use by the OpenGL driver. Any and all OpenGL implementations can use this area, and we agree that when Wine is present, it leaves this area untouched. The question of who allocates the TEB should be pretty straight forward: when an OpenGL driver is first loaded, if the TEB is missing it is allocated as expected. I would imagine when Wine is running that it would have the chance to allocate the TEB before the OpenGL driver is loaded, and thus the OpenGL driver wouldn't have to do anything. The size of the reserved area should be sufficient, although we can debate that if required.
Comments, questions are welcome. I've CC'ed Brian Paul and Keith Whitwell of Mesa/DRI fame, as I know they are interested in this issue. Please CC us on any replies, as we are not subscribed to the list.
-- Gareth Hughes (gareth@nvidia.com) OpenGL Developer, NVIDIA Corporation