These changes are enough to allow building the entirety of the native (ELF) side with `-flto`.
-- v2: loader: Mark thread_ldt, thread_data, wld_start "used". ntdll: Mark call_init_thunk "used".
From: William Horvath william@horvath.blog
It's only referenced by inline asm, so this prevents the linker from discarding the symbol if (e.g.) LTO is used. --- dlls/ntdll/unix/signal_arm.c | 4 ++-- dlls/ntdll/unix/signal_arm64.c | 4 ++-- dlls/ntdll/unix/signal_i386.c | 4 ++-- dlls/ntdll/unix/signal_x86_64.c | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/dlls/ntdll/unix/signal_arm.c b/dlls/ntdll/unix/signal_arm.c index c9ae492597d..12389fa8f00 100644 --- a/dlls/ntdll/unix/signal_arm.c +++ b/dlls/ntdll/unix/signal_arm.c @@ -1121,8 +1121,8 @@ void signal_init_process(void) /*********************************************************************** * call_init_thunk */ -void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB *teb, - struct syscall_frame *frame, void *syscall_cfa ) +void __attribute__((used)) call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB *teb, + struct syscall_frame *frame, void *syscall_cfa ) { struct arm_thread_data *thread_data = (struct arm_thread_data *)&teb->GdiTebBatch; CONTEXT *ctx, context = { CONTEXT_ALL }; diff --git a/dlls/ntdll/unix/signal_arm64.c b/dlls/ntdll/unix/signal_arm64.c index 57d9c375076..6d3efaed31c 100644 --- a/dlls/ntdll/unix/signal_arm64.c +++ b/dlls/ntdll/unix/signal_arm64.c @@ -1447,8 +1447,8 @@ void syscall_dispatcher_return_slowpath(void) /*********************************************************************** * call_init_thunk */ -void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB *teb, - struct syscall_frame *frame, void *syscall_cfa ) +void __attribute__((used)) call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB *teb, + struct syscall_frame *frame, void *syscall_cfa ) { struct arm64_thread_data *thread_data = (struct arm64_thread_data *)&teb->GdiTebBatch; CONTEXT *ctx, context = { CONTEXT_ALL }; diff --git a/dlls/ntdll/unix/signal_i386.c b/dlls/ntdll/unix/signal_i386.c index 8ae7afc769e..058a6a50d0e 100644 --- a/dlls/ntdll/unix/signal_i386.c +++ b/dlls/ntdll/unix/signal_i386.c @@ -2496,8 +2496,8 @@ void signal_init_process(void) /*********************************************************************** * call_init_thunk */ -void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB *teb, - struct syscall_frame *frame, void *syscall_cfa ) +void __attribute__((used)) call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB *teb, + struct syscall_frame *frame, void *syscall_cfa ) { struct x86_thread_data *thread_data = (struct x86_thread_data *)&teb->GdiTebBatch; CONTEXT *ctx, context = { CONTEXT_ALL }; diff --git a/dlls/ntdll/unix/signal_x86_64.c b/dlls/ntdll/unix/signal_x86_64.c index caa85249896..0e293430879 100644 --- a/dlls/ntdll/unix/signal_x86_64.c +++ b/dlls/ntdll/unix/signal_x86_64.c @@ -2575,8 +2575,8 @@ void signal_init_process(void) /*********************************************************************** * call_init_thunk */ -void call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB *teb, - struct syscall_frame *frame, void *syscall_cfa ) +void __attribute__((used)) call_init_thunk( LPTHREAD_START_ROUTINE entry, void *arg, BOOL suspend, TEB *teb, + struct syscall_frame *frame, void *syscall_cfa ) { struct amd64_thread_data *thread_data = (struct amd64_thread_data *)&teb->GdiTebBatch; CONTEXT *ctx, context = { 0 };
From: William Horvath william@horvath.blog
They're only referenced by inline asm, so this prevents the linker from discarding the symbol if (e.g.) LTO is used. --- loader/preloader.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/loader/preloader.c b/loader/preloader.c index d0551bae63a..3df0884d611 100644 --- a/loader/preloader.c +++ b/loader/preloader.c @@ -181,6 +181,7 @@ void __bb_init_func(void) { return; }
static int thread_data[256];
+__attribute__((used)) struct { /* this is the kernel modify_ldt struct */ @@ -333,7 +334,7 @@ static inline int wld_prctl( int code, long arg )
#elif defined(__x86_64__)
-void *thread_data[256]; +void __attribute__((used)) *thread_data[256];
/* * The _start function is the entry and exit point of this program @@ -422,7 +423,7 @@ SYSCALL_NOERR( wld_getegid, 108 /* SYS_getegid */ );
#elif defined(__aarch64__)
-void *thread_data[256]; +void __attribute__((used)) *thread_data[256];
/* * The _start function is the entry and exit point of this program @@ -529,7 +530,7 @@ SYSCALL_NOERR( wld_getegid, 177 /* SYS_getegid */ );
#elif defined(__arm__)
-void *thread_data[256]; +void __attribute__((used)) *thread_data[256];
/* * The _start function is the entry and exit point of this program @@ -1395,7 +1396,7 @@ static void set_process_name( int argc, char *argv[] ) * Load the binary and then its ELF interpreter. * Note, we assume that the binary is a dynamically linked ELF shared object. */ -void* wld_start( void **stack ) +void* __attribute__((used)) wld_start( void **stack ) { long i, *pargc; char **argv, **p;
Hi,
It looks like your patch introduced the new failures shown below. Please investigate and fix them before resubmitting your patch. If they are not new, fixing them anyway would help a lot. Otherwise please ask for the known failures list to be updated.
The tests also ran into some preexisting test failures. If you know how to fix them that would be helpful. See the TestBot job for the details:
The full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=150812
Your paranoid android.
=== debian11b (64 bit WoW report) ===
mshtml: xmlhttprequest.c:460: Test failed: AllResponseHeaders(L"Date: Thu, 09 Jan 2025 08:30:10 GMT\r\nContent-Type: application/xml\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\nLast-Modified: Tue, 14 Jun 2022 15:45:18 GMT\r\nETag: W/"33-5e16a4aa23f18"\r\nCF-Cache-Status: DYNAMIC\r\nReport-To: {"endpoints":[{"url":"https:\/\/a.n"...) don't have expected substr(L"Content-Length: 51") xmlhttprequest.c:460: Test failed: AllResponseHeaders(L"Date: Thu, 09 Jan 2025 08:30:11 GMT\r\nContent-Type: application/xml\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\nLast-Modified: Tue, 14 Jun 2022 15:45:18 GMT\r\nETag: W/"33-5e16a4aa23f18"\r\nCF-Cache-Status: DYNAMIC\r\nReport-To: {"endpoints":[{"url":"https:\/\/a.n"...) don't have expected substr(L"Content-Length: 51")
winhttp: notification.c:734: Test failed: got unexpected thread 0x1080, err 0 winhttp.c:1205: Test failed: available_size = 548
I think, to be correct, we should decorate **all** functions that are referred to from inline asm with the `used` attribute. This means add it to `abort_thread` too. The reason being that there's no guarantee the compiler will keep it in its actual form around. It could completely inline it and then remove it, or copy it and propagate constant arguments or whatever other optimization it can think of, and the original won't be available. I mean it's free to do with it as it wants since it thinks it's not used anywhere else.
Also I think we should hide this behind a macro like `DECLSPEC_USED` for example, like we do for `DECLSPEC_HIDDEN`. @julliard thoughts?
On Thu Jan 23 21:01:24 2025 +0000, Gabriel Ivăncescu wrote:
I think, to be correct, we should decorate **all** functions that are referred to from asm functions with the `used` attribute. This means add it to `abort_thread` too. The reason being that there's no guarantee the compiler will keep it in its actual form around. It could completely inline it and then remove it, or copy it and propagate constant arguments or whatever other optimization it can think of, and the original won't be available. I mean it's free to do with it as it wants since it thinks it's not used anywhere else. Also I think we should hide this behind a macro like `DECLSPEC_USED` for example, like we do for `DECLSPEC_HIDDEN`. @julliard thoughts?
We only need to decorate all functions that are **only** called from asm, not **every** function that's ever called from asm at all. All we are doing is ensuring that these problematic functions/objects remain available and don't get optimized out at link-time. I haven't come across a case where this wasn't enough (with `ld.bfd`).
For reference, [`dlls/wow64/syscall.c`](https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/wow64/syscall.c?ref_t...) contains a similar pattern. `call_user_exception_dispatcher` has the attribute, because it's only reachable from an asm block, counting `Wow64PassExceptionToGuest` which itself is only reachable from asm.
In general, my preferred approach for working towards fully functional Wine+LTO would be to make targeted, small changes in small batches, with whatever techniques are required for each case. Given that there's no chance that LTO will be a default compiler/linker option for Wine, I don't think it's worth struggling to wrangle everything all at once. In this case, I think that these functions should have had these attributes from the start, like the WoW64 example.
On Thu Jan 23 21:01:24 2025 +0000, William Horvath wrote:
We only need to decorate all functions that are **only** called from asm, not **every** function that's ever called from asm at all. All we are doing is ensuring that these problematic functions/objects remain available and don't get optimized out at link-time. I haven't come across a case where this wasn't enough (with `ld.bfd`). For reference, [`dlls/wow64/syscall.c`](https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/wow64/syscall.c?ref_t...) contains a similar pattern. `call_user_exception_dispatcher` has the attribute, because it's only reachable from an asm block, counting `Wow64PassExceptionToGuest` which itself is only reachable from asm. In general, my preferred approach for working towards fully functional Wine+LTO would be to make targeted, small changes in small batches, with whatever techniques are required for each case. Given that there's no chance that LTO will be a default compiler/linker option for Wine, I don't think it's worth struggling to wrangle everything all at once. In this case, I think that these functions should have had these attributes from the start, like the WoW64 example.
As I said, just because it works by luck does not mean that it's the correct way to do it. The compiler is free to change the function signature or behavior if:
- it's not exported - it knows all callers
And then what will the asm block that calls it do? Either give compile-time error with "undefined symbol" because it can't find the original function, or run-time crash because the compiler changed the function's behavior which, while correct for all C callers, is not correct for the asm block. That's because the compiler does not look into asm statements. It doesn't know it's being called from an asm block.
BTW GCC does do function signature changes and cloning when optimizing. A simple case is constant propagation where a constant argument is replaced and removed (if the called function is always called with the same constant), or parameter is removed if unused and so on, but there's many more. A new GCC version or LLVM might make your patch not be enough anymore. What else do you think the "no cloning" attribute does?
So just decorate **all** functions called from asm blocks, to be safe and correct with LTO.
On Thu Jan 23 21:56:30 2025 +0000, Gabriel Ivăncescu wrote:
As I said, just because it works by luck does not mean that it's the correct way to do it. The compiler is free to change the function signature or behavior if:
- it's not exported
- it knows all callers
And then what will the asm block that calls it do? Either give compile-time error with "undefined symbol" because it can't find the original function, or run-time crash because the compiler changed the function's behavior which, while correct for all C callers, is not correct for the asm block. That's because the compiler does not look into asm statements. It doesn't know it's being called from an asm block. BTW GCC does do function signature changes and cloning when optimizing. A simple case is constant propagation where a constant argument is replaced and removed (if the called function is always called with the same constant), or parameter is removed if unused and so on, but there's many more. A new GCC version or LLVM might make your patch not be enough anymore. What else do you think the "no cloning" attribute does? So just decorate **all** functions called from asm blocks, to be safe and correct with LTO.
I understand what you're saying and think you're correct, but I think there has to be a lazier way to do this correctly, instead of requiring a tedious tree-wide update for the benefit of almost nobody. This already isn't enough to result in a working `wine-preloader` if lld is used instead of bfd (it needs `-Wl,--no-relax` as well), so I'm just going to drop this for now.
Thanks for your thoughts and insight into the topic.
This merge request was closed by William Horvath.
On Thu Jan 23 22:13:18 2025 +0000, William Horvath wrote:
I understand what you're saying and think you're correct, but I think there has to be a lazier way to do this correctly, instead of requiring a tedious tree-wide update for the benefit of almost nobody. This already isn't enough to result in a working `wine-preloader` if lld is used instead of bfd (it needs `-Wl,--no-relax` as well), so I'm just going to drop this for now. Thanks for your thoughts and insight into the topic.
I don't understand what's so difficult about also adding it to `abort_thread` that you had to close this?
I never said you should make it tree-wide in a single MR, or even in other MRs if you don't feel like it. Any improvement, even partial, is still fine.
But since you already touch this component, why not just add it to `abort_thread` as well?
Anyway, the other thing is it still should be encapsulated in a macro, IMO. Still waiting for Alexandre's suggestion(s) on that one.
The scope of this MR is fine as-is.
On Fri Jan 24 19:06:36 2025 +0000, Gabriel Ivăncescu wrote:
I don't understand what's so difficult about also adding it to `abort_thread` that you had to close this? I never said you should make it tree-wide in a single MR, or even in other MRs if you don't feel like it. Any improvement, even partial, is still fine. But since you already touch this component, why not just add it to `abort_thread` as well? Anyway, the other thing is it still should be encapsulated in a macro, IMO. Still waiting for Alexandre's suggestion(s) on that one. The scope of this MR is fine as-is.
It's not "so difficult", I closed it because the MR doesn't make sense to me anymore if it's no longer just focused on fixing what's broken. This approach generally feels wrong to me if, to continue with fixing the PE-side, we'd have to do something similar for dozens of other occurrences. Like I mentioned, I believe there is a better ("lazier") way to make sure the compiler behaves properly with inline asm function calls, instead of manual additions. I will explore other options.
On Fri Jan 24 19:06:36 2025 +0000, William Horvath wrote:
It's not "so difficult", I closed it because the MR doesn't make sense to me anymore if it's no longer just focused on fixing what's broken. This approach generally feels wrong to me if, to continue with fixing the PE-side, we'd have to do something similar for dozens of other occurrences. Like I mentioned, I believe there is a better ("lazier") way to make sure the compiler behaves properly with inline asm function calls, instead of manual additions. I will explore other options.
There's no better way. You simply have to let the compiler know. And like I said, you don't have to do the others if you don't want to, in fact I very much prefer to keep this MR short and only focused on minimal amount of components (or unix side as you did), so we can get the macro and "infrastructure" in first.
For your concern about doing it for everything: there's not that many ASM_GLOBAL_FUNCs that call other functions around in general, it's not like the PE conversion that touched every single module. But I could even take up that task if you want, as long as we get this one in first so I know how to proceed.