x86_64 Windows and macOS both use `%gs` to access thread-specific data (Windows TEB, macOS TSD). To date, Wine has worked around this conflict by filling the most important TEB fields (`0x30`/`Self`, `0x58`/`ThreadLocalStorage`) in the macOS TSD structure (Apple reserved the fields for our use). This was sufficient for most Windows apps.
CrossOver's Wine had an additional hack to handle `0x60`/`ProcessEnvironmentBlock`, and binary patches for certain CEF binaries which directly accessed `0x8`/`StackBase`. Additionally, Apple's libd3dshared could activate a special mode in Rosetta 2 where code executing in certain regions would use the Windows TEB when accessing `%gs`.
Now that the PE separation is complete, GSBASE can be swapped when entering/exiting PE code. This is done in the syscall dispatcher, unix-call dispatcher, and for user-mode callbacks. GSBASE also needs to be set to the macOS TSD when entering signal handlers (in `init_handler()`), and then restored to the Windows TEB when exiting (in `leave_handler()`). There is a special-case needed in `usr1_handler`: when inside a syscall (on the kernel stack), GSBASE may need to be reset to either the TEB or the TSD. The only way to tell is to determine what GSBASE was set to on entry to the signal handler.
---
macOS does not have a public API for setting GSBASE, but the private `_thread_set_tsd_base()` works and was added in macOS 10.12.
`_thread_set_tsd_base()` is a small thunk that sets `%esi`, `%eax`, and does the `syscall`: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57….
The syscall instruction itself clobbers `%rcx` and `%r11`.
I've tried to save as few registers as possible when calling `_thread_set_tsd_base()`, but there may be room for improvement there.
---
I've tested this successfully on macOS 15 (Apple Silicon and Intel) with several apps and games, including the `cefclient.exe` CEF sample.
I still need to test this patch on macOS 10.13, and I'd also like to do some performance testing.
---
I also tested an alternate implementation strategy for this which took advantage of the expanded "full" thread state which is passed to signal handlers when a process has set a user LDT. The full thread state includes GSBASE, so GSBASE is set back to whatever is in the sigcontext on return (like every other field in the context). This would avoid needing to explicitly reset GSBASE in `leave_handler()`, and avoid the special-case in `usr1_handler()`.
This strategy was simpler, but I'm not using it for 2 reasons:
- the "full" thread state is only available starting with macOS 10.15, and we still support 10.13.
- more crucially, Rosetta 2 doesn't seem to correctly implement the GS.base field of the full thread state. It's set to 0 on entry, and isn't read on exit.
--
v2: ntdll: Remove x86_64 Mac-specific TEB access workarounds that are no longer needed.
ntdll: On macOS x86_64, swap GSBASE between the TEB and macOS TSD when entering/leaving PE code.
ntdll: Ensure init_handler runs in signal handlers before any compiler-generated memset calls.
ntdll: Remove ugly fallback method for getting a thread's GSBASE on macOS.
ntdll: Leave kernel stack before accessing %gs in x86_64 syscall dispatcher.
ntdll: Do %gs accesses before switching to kernel stack in x86_64 syscall dispatcher.
https://gitlab.winehq.org/wine/wine/-/merge_requests/6866
This commit replaces the SampleGrabberSink with a dedicated Video Sink,
referred to as the Simple Video Renderer (SVR).
This brings it more inline with Windows and provides the benefit of
having direct access to the IMFSample, removing the need to copy the
sample data.
--
v8: mfmediaengine: Fallback to sample copy if scaling is required.
mfmediaengine: Implement D3D-aware video frame sink.
mfmediaengine: Implement SVR.
https://gitlab.winehq.org/wine/wine/-/merge_requests/5924