wine-devel February 2003

wine-devel@winehq.org

141 participants
291 discussions

RE: Fast thread-local storage for OpenGL drivers
by Gareth Hughes 22 Feb '03

22 Feb '03

Alexandre Julliard wrote: > > This is not something we can guarantee. The layout of the thread > structure in Wine is defined by Microsoft, and it's very possible that > they will use these fields someday for something that we need to > emulate. What about the area currently used for GDI? One other method could be to have space at a negative offset from %fs reserved for OpenGL, similar to what the IA32 implementation of NTPL does in Variant II of its design. -- Gareth Hughes (gareth(a)nvidia.com) OpenGL Developer, NVIDIA Corporation

2 1

Fix to allow running msvc from commandline in linux
by Dan Kegel 22 Feb '03

22 Feb '03

Changelog: programs/wcmd/wcmdmain.c: make 'cmd /c cl /MUMBLE foo.c' pass /MUMBLE to cl LGPL. ------- Here's how I use msvc's compiler from the Linux shell. In ~/bin create the following three files: --- cl --- wine wcmd /c F:/bin/cl.bat "$@" --- cl.bat --- call F:/bin/vcvars32.bat cl %1 %2 %3 %4 %5 --- vcvars32.bat --- (same as one installed by msvc elsewhere, but with quotes removed -- wcmd doesn't deal well with quotes. Hopefully wcmd will be fixed soon, or is already fixed, to handle quotes properly.) Then you can just build demo programs under MSVC by saying cl foo.c in bash. This is handy for checking portability of wine conformance tests. Unfortunately, you can't pass options to cl, e.g. cl /DSTANDALONE foo.c without the attached patch. - Dan -- Dan Kegel http://www.kegel.com http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045 Index: programs/wcmd/wcmdmain.c =================================================================== RCS file: /home/wine/wine/programs/wcmd/wcmdmain.c,v retrieving revision 1.22 diff -u -p -r1.22 wcmdmain.c --- programs/wcmd/wcmdmain.c 11 Feb 2003 22:01:11 -0000 1.22 +++ programs/wcmd/wcmdmain.c 22 Feb 2003 19:37:48 -0000 @@ -56,14 +56,14 @@ HANDLE h; args[0] = param[0] = '\0'; if (argc > 1) { - for (i=1; i<argc; i++) { - if (argv[i][0] == '/') { + /* interpreter options must come before the command to execute. + * Any options after the command are command options, not interpreter options. + */ + for (i=1; i<argc && argv[i][0] == '/'; i++) strcat (args, argv[i]); - } - else { + for (; i<argc; i++) { strcat (param, argv[i]); strcat (param, " "); - } } }

1 0

RE: Fast thread-local storage for OpenGL drivers
by Gareth Hughes 22 Feb '03

22 Feb '03

Dan Kegel wrote: > > OK, "all new platforms". Sounds like a good argument for tagging > along with glibc's __thread variable support, if you ask me. We still have to support all older platforms as well. -- Gareth Hughes (gareth(a)nvidia.com) OpenGL Developer, NVIDIA Corporation

2 1

RE: Fast thread-local storage for OpenGL drivers
by Gareth Hughes 22 Feb '03

22 Feb '03

Dan Kegel wrote: > > Sure, but it doesn't have to be as blazingly fast on old platforms. Actually, yes it does. > And, practically speaking, what you're looking for > might in fact only be possible using the new glibc > stuff; as Alexander said, Wine might be forced > to break any constraint you try to get them to agree to. Negative offsets from %fs would avoid any overlap. Perhaps that would be a better solution. -- Gareth Hughes (gareth(a)nvidia.com) OpenGL Developer, NVIDIA Corporation

1 0

RE: Fast thread-local storage for OpenGL drivers
by Gareth Hughes 22 Feb '03

22 Feb '03

Jakub Jelinek wrote: > > What actually matters is the size of PT_TLS segment of the shared library > which defines those 2-3 __thread variables (I assume it is libGL.so, > right?). Generally, yes. > It would be good if the rest of __thread variables which aren't > performance critical is provided by some other library (and accessed > always through GD or LD model). I guess it would be possible to have non-critical variables defined in the driver backend, say. > Forgot to say, the offsets are obviously constant (until you dlclose the > library which declares them). > If they weren't, one couldn't keep pointers to __thread variables around > in IE/LE models. The variables could move in the dynamic case, hence the function call to get their address. We're trying to avoid that, obviously :-) -- Gareth Hughes (gareth(a)nvidia.com) OpenGL Developer, NVIDIA Corporation

1 0

RE: Fast thread-local storage for OpenGL drivers
by Gareth Hughes 22 Feb '03

22 Feb '03

Jakub Jelinek wrote: > > > I'm pretty sure all implementations of OpenGL are not compiled as PIC at > > this point in time. > > AFAIK on x86 only, but it is wrong everywhere. Can you elaborate? > On x86, ld supports Local Exec model in shared libraries (while for most > other targets it does not). R_386_TLS_LE relocation is simply during > -shared linking changed into R_386_TLS_TPOFF dynamic relocation (the same > as is used for IE model, but there this reloc is against .got section > while for LE it is against text section). To be clear, are you saying that even if you declare a variable as LE, once you link the relevant code into a shared library it is changed to IE? > So, if you don't use -fpic anyway, you can just use LE model on IA-32, > if you finally change it so that -fpic is used for the whole library, > then those functions (or assembly stubs) can be put into > some SHF_ALLOC|SHF_WRITE|SHF_EXECINSTR section. If we don't use PIC, do we always get LE if we declare a variable as LE? > Which means you should use the default -ftls-model and use > __attribute__((tls_model("initial-exec"))) or > __attribute__((tls_model("local-exec"))) > for the variables which are really performance critical. Yes, that would be the plan. > On IA-32, you can use > __asm ("jmp 1f; .section writetext, \"awx\"; 1: movl > $foo@ntpoff, %0; jmp 2f; .previous; 2:" : "=r" (foo_offset)); > to query some variable's offset which you can later on use with: > __asm ("movl %gs:0(%1), %0" : "=r" (foo_value) : "r" (foo_offset)); > Please do something like this only for runtime generated code, not for > anything else. Of course, silly me. -- Gareth Hughes (gareth(a)nvidia.com) OpenGL Developer, NVIDIA Corporation

1 0

RE: Fast thread-local storage for OpenGL drivers
by Gareth Hughes 22 Feb '03

22 Feb '03

Dan Kegel wrote: > > If it turns out glibc's new __thread variable support really can do what > you need on all platforms, do you agree that it might be better to use > that? Insert the word "new" between "all" and "platforms", and maybe you'll have more of an appreciation for my point of view :-) > Which platforms are those? I didn't see this in your previous email. How about basically every IA32 Linux installation out there (or, at the very least, those that are currently supported by our drivers)? How many people are actually using systems that have full support for __thread? As I understand it, RedHat 8.1 will be the first large-scale release that does. -- Gareth

2 1

RE: Fast thread-local storage for OpenGL drivers
by Gareth Hughes 22 Feb '03

22 Feb '03

Daniel Jacobowitz wrote: > > Note the "always" in Roland's paragraph. Note the fact that he said it would require one of the dynamic access models (GD or LD), which require at least one function call to access thread local variables. As I've said, this is an unacceptable hit on performance. > When you say two or three, are these two or three pointers or two or > three large tables? Two or three pointers. I'm pretty sure we use less than 8 pointers all up, although many of those aren't performance critical. Three of ours most definitely are, and it would be nice if moving to a couple more didn't break things. We only ever use thread-local pointers, never whole structs or anything like that. > In any case, it sounds like you could: > - select the thread-local variables that you need fast access to > - Arrange for those variables to be tagged with an > __attribute__((tls_model("initial-exec"))), or something similar. > - Make sure the TLS_STATIC_SURPLUS is big enough to hold them. Will this be okay, considering that two shared libraries will need access to the variables (libGL.so itself and the driver backend)? Can you use IE or LE with variables that live in another shared library? > I don't see a problem, but you'd have to do some serious reading of the > TLS ABI documents.... they're quite thorough. Sure, the code itself isn't hard to understand. The problem is, at runtime, how do I know what code to generate to access a given __thread variable? Do I have do disassemble a function that accesses the variable to know the right model to use? Fixed offsets make this trivial, but maybe this isn't a real problem after all. -- Gareth Hughes (gareth(a)nvidia.com) OpenGL Developer, NVIDIA Corporation

2 1

wine-patches broken?
by Stefan Leichter 22 Feb '03

22 Feb '03

Hello, since Thursday i have tried two times to send a mail to wine-patches but the mails does not show up there. I have also check the archive on winehq. The mails are not in. To check the mail account i have send the last mail to another email address of me, where it was received. Anyone else have this problem? There are two special things on my email account that may trigger the problem: * the email address used for posting has mail delivery disables for the mailing lists wine-patches and wine-devel. * the domain the mails are send (t-online.de) do not match the domain of the email address. This causes sometimes problems with spamfilers, abuse ... Bye Stefan

1 0

Re: Fast thread-local storage for OpenGL drivers
by Gareth Hughes 22 Feb '03

22 Feb '03

Roland McGrath wrote: > > These people clearly haven't read all of the TLS paper, or looked at the > GCC implementation of __thread long enough to notice -ftls-model and > __attribute__ ((tls_model)). This is what I was talking about. I've read the entire document several times, and still can't see a way that a dynamically loadable shared library can be guaranteed to use the single-instruction Local Exec access model. If I'm wrong, please explain why. > I think the TLS document intends to explain what the models mean in > practical terms on each architecture, but I can believe it's not all > that clear. The GCC manual doesn't explain the access models and code > sequences, just tells you how to tell the compiler what you want in the > terms that the TLS document defines. > > If you want maximal flexibility, i.e. to always work with dlopen, then > indeed you must use the "dynamic" TLS access models (GD or LD). You can > use the Initial Exec model if you want faster accesses at the cost of some > flexibility. libGL.so simply has to work with dlopen -- if for no other reason than essentially all major 3D games (Quake3, Doom3, UT2003 etc) dlopen libGL.so rather than linking with it. This is not going to change. > When compiling PIC, IE-model accesses have one additional indirection, > i.e. loading the offset from the GOT just as the address of a global > variable is loaded in PIC. See the instruction sequences in the TLS spec. I'm pretty sure all implementations of OpenGL are not compiled as PIC at this point in time. That's a whole other discussion, however. > If you use static linking, these instruction sequences reduce to constants > at link time (i.e. direct "%gs:NNN" accesses on x86). Can you describe how I could use static linking here? As I said, libGL.so must be a dynamically loadable shared library. What we want is the single-instruction Local Exec access model. At this point in time, my understanding of the situation is that these are mutually exclusive requirements. > If you link a shared object containing IE-model access relocs, the object > will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen > might refuse to load it. As I said, not being able to dlopen libGL.so is unacceptable. > In glibc, we actually allocate some excess space in the thread-local > storage area layout determined at startup time. This lets a dynamically > loaded module use static TLS if its PT_TLS segment fits in the available > surplus. (In sysdeps/generic/dl-tls.c, see TLS_STATIC_SURPLUS.) If there > is insufficient space preallocated, then loading the module will fail. In > fact, we put this feature there with GL in mind and can adjust the > preallocated surplus for what is most useful in practice. I think the set of performance critical thread-local variables is something like two or three (depending on the implementation). The libGL.so API dispatcher needs fast access to one or two of these (dispatch table pointers), while the driver backend needs fast access to all of them (context pointer and dispatch table pointers). The other thread-local variables are generally not accessed in performance-critical situations. Another issue I forgot to mention, or forgot to make clear, is that we need to be able to access these thread-local variables in runtime generated code. A driver's top-level API functions are often generated at runtime, and need to be able to do things like switch dispatch tables (obviously, they'd have direct access to the context they were associated with, and so wouldn't need to go through the pointer in TLS). Are we guaranteed that the __thread variables aren't going to move around? How would we work out what code to generate to access a given __thread variable? (I've included both phil-list and wine-devel, if you'd like this discussion kept to one or other of these lists, please say so). -- Gareth Hughes (gareth(a)nvidia.com) OpenGL Developer, NVIDIA Corporation

3 2

← Newer
1
...
5
6
7
8
9
10
11
...
30
Older →

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

wine-devel February 2003