On 27.07.2016 17:42, Ken Thomases wrote:
64-bit Windows apps have hard-coded accesses to %gs:0x58 baked into them. They need to find the ThreadLocalStoragePointer there.
Technically, the gsbase register and the memory it points to belong to the pthread implementation on macOS. It's used for the pthread TLS implementation. Slot 11 (offset 0x58) is currently used for the implementation of the ttyname() system library function. We do not anticipate that Wine or any of the system libraries or frameworks it uses will call ttyname(). Furthermore, Apple has made it so that future releases of macOS will no longer use that slot. So, we hijack it for our purposes.
Signed-off-by: Ken Thomases ken@codeweavers.com
dlls/ntdll/loader.c | 11 +++++++- dlls/ntdll/signal_x86_64.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+), 1 deletion(-)
I'm not sure if I correctly understand the purpose of this patch. If you have to set %gs:0x58 to some specific value, why not just use an assembly instruction for that?
On Jul 27, 2016, at 11:06 AM, Sebastian Lackner sebastian@fds-team.de wrote:
On 27.07.2016 17:42, Ken Thomases wrote:
64-bit Windows apps have hard-coded accesses to %gs:0x58 baked into them. They need to find the ThreadLocalStoragePointer there.
Technically, the gsbase register and the memory it points to belong to the pthread implementation on macOS. It's used for the pthread TLS implementation. Slot 11 (offset 0x58) is currently used for the implementation of the ttyname() system library function. We do not anticipate that Wine or any of the system libraries or frameworks it uses will call ttyname(). Furthermore, Apple has made it so that future releases of macOS will no longer use that slot. So, we hijack it for our purposes.
Signed-off-by: Ken Thomases ken@codeweavers.com
dlls/ntdll/loader.c | 11 +++++++- dlls/ntdll/signal_x86_64.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+), 1 deletion(-)
I'm not sure if I correctly understand the purpose of this patch. If you have to set %gs:0x58 to some specific value, why not just use an assembly instruction for that?
alloc_tls_slot() is setting ThreadLocalStoragePointer for all threads, not just the current one.
-Ken
On 27.07.2016 18:27, Ken Thomases wrote:
On Jul 27, 2016, at 11:06 AM, Sebastian Lackner sebastian@fds-team.de wrote:
On 27.07.2016 17:42, Ken Thomases wrote:
64-bit Windows apps have hard-coded accesses to %gs:0x58 baked into them. They need to find the ThreadLocalStoragePointer there.
Technically, the gsbase register and the memory it points to belong to the pthread implementation on macOS. It's used for the pthread TLS implementation. Slot 11 (offset 0x58) is currently used for the implementation of the ttyname() system library function. We do not anticipate that Wine or any of the system libraries or frameworks it uses will call ttyname(). Furthermore, Apple has made it so that future releases of macOS will no longer use that slot. So, we hijack it for our purposes.
Signed-off-by: Ken Thomases ken@codeweavers.com
dlls/ntdll/loader.c | 11 +++++++- dlls/ntdll/signal_x86_64.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+), 1 deletion(-)
I'm not sure if I correctly understand the purpose of this patch. If you have to set %gs:0x58 to some specific value, why not just use an assembly instruction for that?
alloc_tls_slot() is setting ThreadLocalStoragePointer for all threads, not just the current one.
-Ken
Hm, I see. Do you know if the offset really changed in the past? I am asking because I assume the bruteforcing is an attempt to make the code more reliable, however on the other hand you still make the assumption that pthread keys directly correspond to indices into the %gs segment, which could also change. A check that it matches expected behavior is probably sufficient.
Besides that, if the gsbase cannot be located, it probably would be preferred to skip this code on following attempts.
On Jul 27, 2016, at 12:14 PM, Sebastian Lackner sebastian@fds-team.de wrote:
On 27.07.2016 18:27, Ken Thomases wrote:
On Jul 27, 2016, at 11:06 AM, Sebastian Lackner sebastian@fds-team.de wrote:
On 27.07.2016 17:42, Ken Thomases wrote:
64-bit Windows apps have hard-coded accesses to %gs:0x58 baked into them. They need to find the ThreadLocalStoragePointer there.
Technically, the gsbase register and the memory it points to belong to the pthread implementation on macOS. It's used for the pthread TLS implementation. Slot 11 (offset 0x58) is currently used for the implementation of the ttyname() system library function. We do not anticipate that Wine or any of the system libraries or frameworks it uses will call ttyname(). Furthermore, Apple has made it so that future releases of macOS will no longer use that slot. So, we hijack it for our purposes.
Signed-off-by: Ken Thomases ken@codeweavers.com
dlls/ntdll/loader.c | 11 +++++++- dlls/ntdll/signal_x86_64.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+), 1 deletion(-)
I'm not sure if I correctly understand the purpose of this patch. If you have to set %gs:0x58 to some specific value, why not just use an assembly instruction for that?
alloc_tls_slot() is setting ThreadLocalStoragePointer for all threads, not just the current one.
-Ken
Hm, I see. Do you know if the offset really changed in the past?
Yes, it has changed in the past.
I am asking because I assume the bruteforcing is an attempt to make the code more reliable, however on the other hand you still make the assumption that pthread keys directly correspond to indices into the %gs segment, which could also change.
Well, we made various proposals to Apple to get a more comprehensive, less kludgy fix and they were rejected because of binary compatibility. Basically, they were unwilling to change how pthread_getspecific() works. Mono depends on it, for example. So, it seems pretty reliable.
A check that it matches expected behavior is probably sufficient.
Besides that, if the gsbase cannot be located, it probably would be preferred to skip this code on following attempts.
It already does that. gsbase_offset is static. It will only be negative on the first call.
-Ken
On 27.07.2016 19:42, Ken Thomases wrote:
Hm, I see. Do you know if the offset really changed in the past?
Yes, it has changed in the past.
I am asking because I assume the bruteforcing is an attempt to make the code more reliable, however on the other hand you still make the assumption that pthread keys directly correspond to indices into the %gs segment, which could also change.
Well, we made various proposals to Apple to get a more comprehensive, less kludgy fix and they were rejected because of binary compatibility. Basically, they were unwilling to change how pthread_getspecific() works. Mono depends on it, for example. So, it seems pretty reliable.
A check that it matches expected behavior is probably sufficient.
Besides that, if the gsbase cannot be located, it probably would be preferred to skip this code on following attempts.
It already does that. gsbase_offset is static. It will only be negative on the first call.
Sorry, I missed the "gsbase_offset = 0;" call. You are probably right that there is not much to improve unless Apple offers a better interface.
I have only two suggestions: - gsbase_offset == 0 (key == i) could also be valid, so it would be useful to distinguish this case. - the logic could be moved to a helper function, for example using the init once mechanism
-Ken
On Jul 27, 2016, at 12:53 PM, Sebastian Lackner sebastian@fds-team.de wrote:
I have only two suggestions:
- gsbase_offset == 0 (key == i) could also be valid, so it would be useful to distinguish this case.
It can't be valid because that would put the TLS at the beginning of struct _pthread, but the ABI actually reveals the first couple of fields. pthread_t is a typedef for __darwin_pthread_t which is a typedef for struct _opaque_pthread_t which is defined (not just declared) in the headers (/usr/include/sys/_pthread/_pthread_types.h in the 10.11 SDK) and is less opaque than its name would suggest. Comments in the definition of struct _pthread in libpthread make note of the ABI compatibility factor.
In practice, the TLS array is at the end of the struct and I think Apple told us it has to be there.
- the logic could be moved to a helper function, for example using the init once mechanism
I suppose, but it will be called for the first thread before there's a chance of creating secondary threads. So, there's no race. Also, I haven't looked, but I'm not sure what thread setup has to have been completed before RtlRunOnceExecuteOnce() can safely be used.
-Ken
On 27.07.2016 20:25, Ken Thomases wrote:
On Jul 27, 2016, at 12:53 PM, Sebastian Lackner sebastian@fds-team.de wrote:
I have only two suggestions:
- gsbase_offset == 0 (key == i) could also be valid, so it would be useful to distinguish this case.
It can't be valid because that would put the TLS at the beginning of struct _pthread, but the ABI actually reveals the first couple of fields. pthread_t is a typedef for __darwin_pthread_t which is a typedef for struct _opaque_pthread_t which is defined (not just declared) in the headers (/usr/include/sys/_pthread/_pthread_types.h in the 10.11 SDK) and is less opaque than its name would suggest. Comments in the definition of struct _pthread in libpthread make note of the ABI compatibility factor.
In practice, the TLS array is at the end of the struct and I think Apple told us it has to be there.
When you are sure that this is always true you can start searching at (key + 1).
- the logic could be moved to a helper function, for example using the init once mechanism
I suppose, but it will be called for the first thread before there's a chance of creating secondary threads. So, there's no race. Also, I haven't looked, but I'm not sure what thread setup has to have been completed before RtlRunOnceExecuteOnce() can safely be used.
It looks like RtlRunOnceExecuteOnce is not safe to use (keyed_event not initialized yet). You could use pthread (like used before in init_teb_key), but probably moving to a function is sufficient to improve code readability. By avoiding the heavily indented code it would already look a bit less hacky. ;)
-Ken
Sebastian Lackner sebastian@fds-team.de writes:
It looks like RtlRunOnceExecuteOnce is not safe to use (keyed_event not initialized yet). You could use pthread (like used before in init_teb_key), but probably moving to a function is sufficient to improve code readability. By avoiding the heavily indented code it would already look a bit less hacky. ;)
Simplifying the error handling would probably also help. For instance, pthread_setspecific() is not going to fail in any meaningful way.