On Wed Mar 19 05:07:15 2025 +0000, Rémi Bernon wrote:
> Is it really useful to test such corner case here?
I'm not sure what you mean by corner case, the test fails on i386 because 0xdeadbeef is too big of an allocation. In any case, letting malloc handle it makes the check unnecessary.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/7597#note_98369
Microsoft Edge worked if windows version is set to win8.1 (or win7) but
now crashes in current wine due to this unimplemented function.
This patch makes it work again in win8.1 mode;
(It's still broken in win10 mode, see https://bugs.winehq.org/show_bug.cgi?id=56378 for more info)
--
v11: win32u: Add stub for NtUserSetAdditionalForegroundBoostProcesses.
https://gitlab.winehq.org/wine/wine/-/merge_requests/7607
On Tue Mar 18 23:36:40 2025 +0000, Jinoh Kang wrote:
> Yeah, I think so. FWIW, even with two MOVs, should be faster than one
> MOV from GS indirect.
Thanks, I suspect the performance impact is negligible but the patch is certainly cleaner.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/6866#note_98354
x86_64 Windows and macOS both use `%gs` to access thread-specific data (Windows TEB, macOS TSD). To date, Wine has worked around this conflict by filling the most important TEB fields (`0x30`/`Self`, `0x58`/`ThreadLocalStorage`) in the macOS TSD structure (Apple reserved the fields for our use). This was sufficient for most Windows apps.
CrossOver's Wine had an additional hack to handle `0x60`/`ProcessEnvironmentBlock`, and binary patches for certain CEF binaries which directly accessed `0x8`/`StackBase`. Additionally, Apple's libd3dshared could activate a special mode in Rosetta 2 where code executing in certain regions would use the Windows TEB when accessing `%gs`.
Now that the PE separation is complete, GSBASE can be swapped when entering/exiting PE code. This is done in the syscall dispatcher, unix-call dispatcher, and for user-mode callbacks. GSBASE also needs to be set to the macOS TSD when entering signal handlers (in `init_handler()`), and then restored to the Windows TEB when exiting (in `leave_handler()`).
Some changes to the syscall dispatcher were needed to ensure that the TEB is not accessed through `%gs` while on the kernel stack (since a SIGUSR1 while on the kernel stack will result in GSBASE being set to the TSD).
---
I've tested this successfully on macOS 15 (Apple Silicon and Intel) and macOS 10.13 with several apps and games, including the `cefclient.exe` CEF sample.
Encouragingly, in some simple tests I didn't see a noticeable performance regression from this MR.
There are drawbacks though:
- libraries which jump directly from PE code into Unix code (expecting that %gs is always pointing to the macOS TSD) will crash. Notable examples are D3DMetal and DXMT. These will need to be changed to use Unix calls.
- If Windows code uses the `syscall` instruction directly, the stack pointer likely needs to be valid (which is probably not true on Windows). This is due to the syscall dispatcher saving registers onto the user stack and having to call `_thread_set_tsd_base`. I can't say I've ever seen direct syscalls done with an invalid `%rsp`, but it seems like something anticheat code might do.
---
macOS does not have a public API for setting GSBASE, but the private `_thread_set_tsd_base()` works and was added in macOS 10.12.
`_thread_set_tsd_base()` is a small thunk that sets `%esi`, `%eax`, and does the `syscall`: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57….
The syscall instruction itself clobbers `%rcx` and `%r11`.
I've tried to save as few registers as possible when calling `_thread_set_tsd_base()`, but there may be room for improvement there.
---
I also tested an alternate implementation strategy for this which took advantage of the expanded "full" thread state which is passed to signal handlers when a process has set a user LDT. The full thread state includes GSBASE, so GSBASE is set back to whatever is in the sigcontext on return (like every other field in the context). This would avoid needing to explicitly reset GSBASE in `leave_handler()`.
This strategy was simpler, but I'm not using it for 2 reasons:
- the "full" thread state is only available starting with macOS 10.15, and we still support 10.13.
- more crucially, Rosetta 2 doesn't seem to correctly implement the GS.base field of the full thread state. It's set to 0 on entry, and isn't read on exit.
--
v4: ntdll: Remove x86_64 Mac-specific TEB access workarounds that are no longer needed.
ntdll: On macOS x86_64, swap GSBASE between the TEB and macOS TSD when entering/leaving PE code.
ntdll: Leave kernel stack before accessing %gs in x86_64 syscall dispatcher.
ntdll: Don't access the TEB through %gs when using the kernel stack in x86_64 syscall dispatcher.
ntdll: Ensure init_handler runs in signal handlers before any compiler-generated memset calls.
ntdll: Remove ugly fallback method for getting a thread's GSBASE on macOS.
https://gitlab.winehq.org/wine/wine/-/merge_requests/6866
Some console objects currently do several unique things:
* Delegate waits onto the queue of another object. This is not really a problem
for in-process waits, since we can just return the sync object for the
delegate. However, it's also unnecessary, adds to the complexity of the server
logic, and is one out of one places where this is done.
* Make the wait state dependent on the process. This is difficult to emulate
with ntsync and would require creating separate server objects for each
process, hacking into duplicate_handle.
* Fail a wait entirely in certain circumstances. This is pretty much impossible
to emulate with in-process waits.
Although ntsync has been in development for some time, I have regrettably failed
to notice these problems until now.
Fortunately, none of these behaviours happen on modern Windows. Although I/O on
unbound handles delegates to the console of the current process, the signaled
state does not. As the tests here show, the signaled state of a handle depends
on the active console of the process in which the handle was created. If that
console no longer exists, the signaled state is no longer updated [with one
rather inexplicable exception].
Crucially, in current Windows waits never fail, and the state of an object is
the same across all process which hold handles to it. Therefore this patch
brings our behaviour to closer match current Windows.
In theory these are fds and should use default_fd_signaled(). However, the
points at which the handles are signaled are completely different, and I/O does
not trigger console handles to become signaled when it normally would. Therefore
for the time being I've kept the code using custom signaled ops.
There is one other oddity related to consoles, which is the existence of
console_add_queue(), which seeks to lazily create an input thread when a console
is first waited on. This is one out of two places, after this patch, when the
wait process is hijacked (the other being message queues). Fortunately this is
easy to handle for in-process synchronization objects, by queueing the ioctl
from the callback used to retrieve the in-process synchronization object itself.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/7608
--
v11: server: Handle hardlinks and casefolding when renaming the same file.
server: Handle renames to destinations containing trailing slashes.
kernel32/tests: Test renaming a file into a hardlink of itself.
kernel32/tests: Use FindClose instead of CloseHandle when closing
https://gitlab.winehq.org/wine/wine/-/merge_requests/6855