Run C++ global/static destructors during DLL_PROCESS_DETACH while other other dllimport functions they may want to call are still viable. While windows imposes many restrictions on what may be done in such destructors (given that the run inside the loader lock), there are lots of legal and useful kernel32 functions like DestroyCriticalSection, DeleteAtom, TlsFree, etc that are both useful and legal. Currently this does not work for builtin modules because all the Win32 structures are discarded well before NtUnmapViewOfSection finally does the dlllose.
Even for a winelib .dll.so module, it would be preferable for destructors to execute during process_detach (before wine tears down the MODREF and detaches dependant dlls), rather than be left until after the last NtUnmapViewOfSection (when we finally reach dlclose)
Therefore, winegcc now always uses the DllMainCRTStartup entry point unless you specify your own --entry=func. Previously it did this only for PE modules using msvcrt. Making this default consistent matches cl.exe, which also always defaults to _DllMainCRTStartup unless overridden by /entry:foo https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbo...
The ELF version of winecrt0.a now provides a DllMainCRTStartup which, per the Itanium ABI that is in practice what is used by gcc and clang, performs this this destruction by calling __cxa_finalize(&__dso_handle).. This libc function is required to be idempotent, so it's OK that dlclose still calls it again later (there will just be no further work to do).
Multiple calls to __cxa_finalize shall not result in calling termination function entries multiple times; the implementation may either remove entries or mark them finished.
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#dso-dtor-runtime-api
This has two main effects; it moves ELF destructors earlier (before imports are unmapped), and it moves them inside the Nt loader lock. Being earlier was the intended goal, and moving them inside the lock seems fine. Any Win32 API calls in destructors are just being subjected to the same lock hierarchy rules as usual on windows (MSVC also runs destructors from DllMainCrtStartup)
https://docs.microsoft.com/en-us/cpp/build/run-time-library-behavior?view=ms...
And any purely-ELF destructors that happen to also run earlier should never call functions exported from wine (and thus don't care about ntdll's locks).
-- v2: winecrt0: run C++ object destructors in DLL_PROCESS_DETACH.
From: Kevin Puetz PuetzKevinA@JohnDeere.com
Run C++ global/static destructors during DLL_PROCESS_DETACH while other other dllimport functions they may want to call are still viable. While windows imposes many restrictions on what may be done in such destructors (given that the run inside the loader lock), there are lots of legal and useful kernel32 functions like DestroyCriticalSection, DeleteAtom, TlsFree, etc that are both useful and legal. Currently this does not work for builtin modules because all the Win32 structures are discarded well before NtUnmapViewOfSection finally does the dlllose.
Even for a winelib .dll.so module, it would be preferable for destructors to execute during process_detach (before wine tears down the MODREF and detaches dependant dlls), rather than be left until after the last NtUnmapViewOfSection (when we finally reach dlclose)
Therefore, winegcc now always uses the DllMainCRTStartup entry point unless you specify your own --entry=func. Previously it did this only for PE modules using msvcrt. Making this default consistent matches cl.exe, which also always defaults to _DllMainCRTStartup unless overridden by /entry:foo https://docs.microsoft.com/en-us/cpp/build/reference/entry-entry-point-symbo...
The ELF version of winecrt0.a now provides a DllMainCRTStartup which, per the Itanium ABI that is in practice what is used by gcc and clang, performs this this destruction by calling __cxa_finalize(&__dso_handle).. This libc function is required to be idempotent, so it's OK that dlclose still calls it again later (there will just be no further work to do).
Multiple calls to __cxa_finalize shall not result in calling termination function entries multiple times; the implementation may either remove entries or mark them finished.
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#dso-dtor-runtime-api
This has two main effects; it moves ELF destructors earlier (before imports are unmapped), and it moves them inside the Nt loader lock. Being earlier was the intended goal, and moving them inside the lock seems fine. Any Win32 API calls in destructors are just being subjected to the same lock hierarchy rules as usual on windows (MSVC also runs destructors from DllMainCrtStartup)
https://docs.microsoft.com/en-us/cpp/build/run-time-library-behavior?view=ms...
And any purely-ELF destructors that happen to also run earlier should never call functions exported from wine (and thus don't care about ntdll's locks). --- dlls/winecrt0/crt_dllmain.c | 19 +++++++++++++++---- dlls/winecrt0/exe_entry.c | 16 +++++++++++++++- dlls/winecrt0/exe_wentry.c | 16 +++++++++++++++- tools/winegcc/winegcc.c | 4 +++- 4 files changed, 48 insertions(+), 7 deletions(-)
diff --git a/dlls/winecrt0/crt_dllmain.c b/dlls/winecrt0/crt_dllmain.c index 181760c884a..539f5f6115c 100644 --- a/dlls/winecrt0/crt_dllmain.c +++ b/dlls/winecrt0/crt_dllmain.c @@ -18,8 +18,6 @@ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA */
-#ifdef __WINE_PE_BUILD - #include <stdarg.h> #include <stdio.h> #include "windef.h" @@ -27,7 +25,20 @@
BOOL WINAPI DllMainCRTStartup( HINSTANCE inst, DWORD reason, void *reserved ) { - return DllMain( inst, reason, reserved ); -} + BOOL result = DllMain( inst, reason, reserved );
+#ifndef __WINE_PE_BUILD + if(reason == DLL_PROCESS_DETACH) { +#if __has_attribute(sysv_abi) /* winecrt0 uses -mabi=ms, but this is a sysv function */ + extern void __cxa_finalize (void *) __attribute__((weak,sysv_abi)); +#else + extern void __cxa_finalize (void *) __attribute__((weak)); #endif + extern void *__dso_handle __attribute((visibility("hidden"),weak)); + if(__cxa_finalize && &__dso_handle) { + __cxa_finalize(&__dso_handle); + } + } +#endif + return result; +} diff --git a/dlls/winecrt0/exe_entry.c b/dlls/winecrt0/exe_entry.c index d4d1d7d6757..b412ab7e14f 100644 --- a/dlls/winecrt0/exe_entry.c +++ b/dlls/winecrt0/exe_entry.c @@ -97,5 +97,19 @@ DWORD WINAPI DECLSPEC_HIDDEN __wine_spec_exe_entry( PEB *peb ) int argc; char **argv = build_argv( GetCommandLineA(), &argc );
- ExitProcess( main( argc, argv )); + int ret = main( argc, argv ); + +#ifndef __WINE_PE_BUILD +#if __has_attribute(sysv_abi) /* winecrt0 uses -mabi=ms, but this is a sysv function */ + extern void __cxa_finalize (void *) __attribute__((weak,sysv_abi)); +#else + extern void __cxa_finalize (void *) __attribute__((weak)); +#endif + extern void *__dso_handle __attribute((visibility("hidden"),weak)); + if(__cxa_finalize && &__dso_handle) { + __cxa_finalize(&__dso_handle); + } +#endif + + ExitProcess( ret ); } diff --git a/dlls/winecrt0/exe_wentry.c b/dlls/winecrt0/exe_wentry.c index a4c1a0897fb..17081dd90ca 100644 --- a/dlls/winecrt0/exe_wentry.c +++ b/dlls/winecrt0/exe_wentry.c @@ -97,5 +97,19 @@ DWORD WINAPI DECLSPEC_HIDDEN __wine_spec_exe_wentry( PEB *peb ) int argc; WCHAR **argv = build_argv( GetCommandLineW(), &argc );
- ExitProcess( wmain( argc, argv )); + int ret = wmain( argc, argv ); + +#ifndef __WINE_PE_BUILD +#if __has_attribute(sysv_abi) /* winecrt0 uses -mabi=ms, but this is a sysv function */ + extern void __cxa_finalize (void *) __attribute__((weak,sysv_abi)); +#else + extern void __cxa_finalize (void *) __attribute__((weak)); +#endif + extern void *__dso_handle __attribute((visibility("hidden"),weak)); + if(__cxa_finalize && &__dso_handle) { + __cxa_finalize(&__dso_handle); + } +#endif + + ExitProcess( ret ); } diff --git a/tools/winegcc/winegcc.c b/tools/winegcc/winegcc.c index ab26adb07e8..639c2af5587 100644 --- a/tools/winegcc/winegcc.c +++ b/tools/winegcc/winegcc.c @@ -1266,6 +1266,8 @@ static void build(struct options* opts) entry_point = (is_pe && opts->target.cpu == CPU_i386) ? "DriverEntry@8" : "DriverEntry"; else if (opts->use_msvcrt && !opts->shared && !opts->win16_app) entry_point = opts->unicode_app ? "wmainCRTStartup" : "mainCRTStartup"; + else if (opts->shared && !opts->win16_app) + entry_point = opts->target.cpu == CPU_i386 ? "DllMainCRTStartup@12" : "DllMainCRTStartup"; } else entry_point = opts->entry_point;
@@ -1303,7 +1305,7 @@ static void build(struct options* opts) for ( j = 0; j < lib_dirs.count; j++ ) strarray_add(&link_args, strmake("-L%s", lib_dirs.str[j]));
- if (is_pe && opts->use_msvcrt && !entry_point && (opts->shared || opts->win16_app)) + if (is_pe && opts->use_msvcrt && !entry_point && opts->win16_app) entry_point = opts->target.cpu == CPU_i386 ? "DllMainCRTStartup@12" : "DllMainCRTStartup";
if (is_pe && entry_point)
Hmm, just noticed a warning I overlooked before:
/usr/bin/ld: /usr/local/lib/wine/i386-unix/libwinecrt0.a(crt_dllmain.o): warning: relocation against `__cxa_finalize@@GLIBC_2.1.3' in read-only section `.text' /usr/bin/ld: warning: creating DT_TEXTREL in a shared object
I am seeing this only on x86 32-bit ubuntu 22.04,, not on amd64 ubuntu and not not our other x86 platforms (a wind river linux builid). Not sure what's going on, or if it indicates anything wrong with this patch (building wine itself does not show the warning...)
Ok, figured out a few more things. The problem seems to be that libwinecrt0.a was compiled with `-fno-PIC`, and then gets linked into .so files that are being linked with `-fPIC`. The use of `-fno-PIC` for winecrt0 seems very much intentional per 8f732c66ab37b54c30d63c74f7822ba1d4f04996 (and indeed only for i386). https://bugs.winehq.org/show_bug.cgi?id=37540#c19 mentions a "suppress read only relocs workaround", but I'm not sure what it was, unless it's a reference to using `-read_only_relocs suppress`. But probably not, since that seems to have been for darwin-powerpc, and this change was only for x86. In any case, wine itself is being consistent, and linking with `-fno-PIC`: https://gitlab.winehq.org/wine/wine/-/blob/master/configure.ac#L813-814.
I see the warning, when wine did not, because my CMake toolchain for winegcc is adding -fPIC to its rules for SHARED and MODULE (.dll.so) files. So that seems to be on me.
And apparently I only see the warning in ubuntu, and not wind river, because it's a relatively new warning added in binutils 2.35: https://github.com/bminor/binutils-gdb/commit/a6dbf402de65fe66f4ec99b56527df...
So I think this patch is probably fine, and the fix is that I should have updated my CMake toolchain to know that wine >= 4.8 doesn't want builtin dll.so/.exe.so files to use `-fPIC` anymore.
On Thu Sep 1 17:48:34 2022 +0000, Kevin Puetz wrote:
Ok, figured out a few more things. The problem seems to be that libwinecrt0.a was compiled with `-fno-PIC`, and then gets linked into .so files that are being linked with `-fPIC`. The use of `-fno-PIC` for winecrt0 seems very much intentional per 8f732c66ab37b54c30d63c74f7822ba1d4f04996 (and indeed only for i386). https://bugs.winehq.org/show_bug.cgi?id=37540#c19 mentions a "suppress read only relocs workaround", but I'm not sure what it was, unless it's a reference to using `-read_only_relocs suppress`. But probably not, since that seems to have been for darwin-powerpc, and this change was only for x86. In any case, wine itself is being consistent, and linking with `-fno-PIC`: https://gitlab.winehq.org/wine/wine/-/blob/master/configure.ac#L813-814. I see the warning, when wine did not, because my CMake toolchain for winegcc is adding -fPIC to its rules for SHARED and MODULE (.dll.so) files. So that seems to be on me. And apparently I only see the warning in ubuntu, and not wind river, because it's a relatively new warning added in binutils 2.35: https://github.com/bminor/binutils-gdb/commit/a6dbf402de65fe66f4ec99b56527df... So I think this patch is probably fine, and the fix is that I should have updated my CMake toolchain to know that wine >= 4.8 doesn't want builtin dll.so/.exe.so files to use `-fPIC` anymore for x86.
Digging still deeper, winecrt0.a is actually a mixture of `-fPIC` and `-fno-PIC` code:
The PE build (using i686-w64-mingw32-gcc) does not specify either way. If I peek with `objdump -r /usr/local/lib/wine/i386-windows/libwinecrt0.a | sed '/RELOCATION RECORDS FOR [.debug/,/^$/d'`, it and seems to get dir32 (== R_DIR32, Direct 32-bit reference to the symbol's virtual address)and DISP32 (== R_PCRLONG, 32-bit longword PC relative relocation) relocs, so not entirely PIC. Which agrees with https://bugs.winehq.org/show_bug.cgi?id=37540 that default windows/PE practice is no-PIC.
Looking at my build log, crt_fltused.c, crt_dllmain.c, debug.c, dll_main.c, delay_load.c, exception.c, dll_canunload.c, exe16_entry.c, unix_lib.c, setjmp.c, stub.c, dll_register.c, register are all built with `gcc -fno-PIC`. exe_entry.c, exe_main.c, exe_wentry.c, exe_wmain.c are compiled with `gcc -fPIC`, and indeed contain R_386_PLT32 and R_386_GOT32X relocations. Presumably this is a consequence of `#pragma makedep unix`, via [`AC_SUBST(UNIXDLLFLAGS,"-fPIC") `](https://gitlab.winehq.org/wine/wine/-/blob/master/configure.ac#L644).
But a lot of the files built with -fno-PIC just incidentally happen to be PIC-compliant, in that they happen to come out with only R_386_PC32. Before the usage of __dso_handle and __cxa_finalize added by this MR, the only ones that were already non-PIC in a meaningful way were:
``` delay_load.o: file format elf32-i386
RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 00000023 R_386_32 __wine_spec_delay_imports
exception.o: file format elf32-i386
RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 0000002c R_386_32 .text
register.o: file format elf32-i386
RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 00000011 R_386_32 .bss 00000053 R_386_32 .rodata 00000062 R_386_32 .rodata 00000071 R_386_32 .rodata 00000084 R_386_32 .rodata 00000098 R_386_32 .rodata.str1.1 000000a5 R_386_32 .bss 0000023d R_386_32 __wine_spec_nt_header 00000242 R_386_32 .text 0000024c R_386_32 .rodata 000002ad R_386_32 __wine_spec_nt_header 000002b2 R_386_32 .text 000002bc R_386_32 .rodata ```
You wouldn't end up referencing exception.o without using the `_TRY` SEH macros (or having widl code do so), and __wine_register_resources you'd pretty much have to call on purpose. Same for using delayload. So *most* things were still OK (in practice) to be -fPIC.
Seems like I should probably move this into its own atexit.c, marked with `#pragma makedep unix`, and then call that from the 3 entry points that need to (which just happen to be R_386_PC32. That restores the status quo vs `-fPIC`, eliminates having 3 duplicates of it in exe_main/exe_main/crt_dllmain, and eliminate the need to mess with `__attribute__((sysv_abi)` to compenate`-mabi=ms` on amd64. win/win/win.