My hardware is Ryzen 7 4700u with the integrated graphics
Flatout (Direct3D 9): 20 fps (renders correctly)
Unigine Heaven (OpenGL): ~40-60 fps (renders correctly)
Unigine Heaven (Direct3D 9): 20-30 fps (renders correctly)
Unigine Heaven (Direct3D 11): 20-30 fps as well (renders correctly)
Elder Scrolls IV (Direct3D 9): 20 fps (renders correctly)
BeamNG Tech Demo 0.3 (Direct3D 9): 2 fps (renders correctly, but still runs poorly)
Massive step up from getting 2 fps across many wined3d games, but it's still pretty bad, ~~and sometimes runs worse than the original code~~. Now with a combination of using the new and old code dynamically you can get the best of both worlds!
Unfortunately, we lose the ability to get lucky with the mapping just happening to be in the 32 bit range.
--
v7: opengl32: speed up wow64 mapping.
https://gitlab.winehq.org/wine/wine/-/merge_requests/5145
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=54194
If the SendMessageTimeout call takes a long time, we can get other
messages which also set the observed wparam value. Apparently,
this is especially likely on Windows 7.
This also removes the (wParam == 0xbaadbeef) check which may have
been intended to serve the same goal but doesn't work because the
observed wParam value is still assigned.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/3862
x86_64 Windows and macOS both use `%gs` to access thread-specific data (Windows TEB, macOS TSD). To date, Wine has worked around this conflict by filling the most important TEB fields (`0x30`/`Self`, `0x58`/`ThreadLocalStorage`) in the macOS TSD structure (Apple reserved the fields for our use). This was sufficient for most Windows apps.
CrossOver's Wine had an additional hack to handle `0x60`/`ProcessEnvironmentBlock`, and binary patches for certain CEF binaries which directly accessed `0x8`/`StackBase`. Additionally, Apple's libd3dshared could activate a special mode in Rosetta 2 where code executing in certain regions would use the Windows TEB when accessing `%gs`.
Now that the PE separation is complete, GSBASE can be swapped when entering/exiting PE code. This is done in the syscall dispatcher, unix-call dispatcher, and for user-mode callbacks. GSBASE also needs to be set to the macOS TSD when entering signal handlers (in `init_handler()`), and then restored to the Windows TEB when exiting (in `leave_handler()`). There is a special-case needed in `usr1_handler`: when inside a syscall (on the kernel stack), GSBASE may need to be reset to either the TEB or the TSD. The only way to tell is to determine what GSBASE was set to on entry to the signal handler.
---
macOS does not have a public API for setting GSBASE, but the private `_thread_set_tsd_base()` works and was added in macOS 10.12.
`_thread_set_tsd_base()` is a small thunk that sets `%esi`, `%eax`, and does the `syscall`: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57….
The syscall instruction itself clobbers `%rcx` and `%r11`.
I've tried to save as few registers as possible when calling `_thread_set_tsd_base()`, but there may be room for improvement there.
---
I've tested this successfully on macOS 15 (Apple Silicon and Intel) with several apps and games, including the `cefclient.exe` CEF sample.
I still need to test this patch on macOS 10.13, and I'd also like to do some performance testing.
---
I also tested an alternate implementation strategy for this which took advantage of the expanded "full" thread state which is passed to signal handlers when a process has set a user LDT. The full thread state includes GSBASE, so GSBASE is set back to whatever is in the sigcontext on return (like every other field in the context). This would avoid needing to explicitly reset GSBASE in `leave_handler()`, and avoid the special-case in `usr1_handler()`.
This strategy was simpler, but I'm not using it for 2 reasons:
- the "full" thread state is only available starting with macOS 10.15, and we still support 10.13.
- more crucially, Rosetta 2 doesn't seem to correctly implement the GS.base field of the full thread state. It's set to 0 on entry, and isn't read on exit.
--
v3: ntdll: Remove x86_64 Mac-specific TEB access workarounds that are no longer needed.
ntdll: On macOS x86_64, swap GSBASE between the TEB and macOS TSD when entering/leaving PE code.
ntdll: Ensure init_handler runs in signal handlers before any compiler-generated memset calls.
ntdll: Remove ugly fallback method for getting a thread's GSBASE on macOS.
ntdll: Leave kernel stack before accessing %gs in x86_64 syscall dispatcher.
ntdll: Do %gs accesses before switching to kernel stack in x86_64 syscall dispatcher.
https://gitlab.winehq.org/wine/wine/-/merge_requests/6866
As shown by the testbot, doubling is not always sufficient.
--
v2: iphlpapi/tests: Call GetExtendedTcp/UdpTable() in a loop.
iphlpapi/tests: Call GetAdaptersAddresses() in a loop.
https://gitlab.winehq.org/wine/wine/-/merge_requests/3833