Showing that only non-volatile registers are reliably saved. Volatile registers are only saved by NtGetContextThread whenever it interrupts a thread in user space, and are otherwise returned from some previous, possibly outdated, state.
@jacek Unless I'm missing something, I think this shows that we do not have to save the full context in syscalls in general, and instead only the non-volatile XMM registers?
NtGetContextThread syscall still probably needs to save the full context and it should probably be using a specific code path.
From: Rémi Bernon rbernon@codeweavers.com
Showing that only non-volatile registers are reliably saved. Volatile registers are only saved by NtGetContextThread whenever it interrupts a thread in user space, and are otherwise returned from some previous, possibly outdated, state. --- dlls/ntdll/tests/thread.c | 253 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 253 insertions(+)
diff --git a/dlls/ntdll/tests/thread.c b/dlls/ntdll/tests/thread.c index 3086247d5f4..9233b49b987 100644 --- a/dlls/ntdll/tests/thread.c +++ b/dlls/ntdll/tests/thread.c @@ -21,6 +21,8 @@
#include "ntdll_test.h"
+#include "intrin.h" + static NTSTATUS (WINAPI *pNtCreateThreadEx)( HANDLE *, ACCESS_MASK, OBJECT_ATTRIBUTES *, HANDLE, PRTL_THREAD_START_ROUTINE, void *, ULONG, ULONG_PTR, SIZE_T, SIZE_T, PS_ATTRIBUTE_LIST * ); @@ -177,10 +179,261 @@ static void test_unique_teb(void) ok( args1.teb != args2.teb, "Multiple threads have TEB %p.\n", args1.teb ); }
+#if defined(__i386__) || defined(__x86_64__) +static LONG test_context_signal; + +static DWORD WINAPI test_context_thread( void *arg ) +{ + LARGE_INTEGER timeout = {.QuadPart = 100 * -10000}; + UINT64 DECLSPEC_ALIGN(16) initial[128]; + UINT64 DECLSPEC_ALIGN(16) fpu_buf[128]; + CONTEXT *context; + M128A *xmm, *ymm; + void *buffer; + DWORD size; + BOOL ret; + + ret = InitializeContext( NULL, CONTEXT_ALL | CONTEXT_FLOATING_POINT | CONTEXT_XSTATE, NULL, &size ); + ok( !ret && GetLastError() == ERROR_INSUFFICIENT_BUFFER, "InitializeContext failed, error %lu\n", GetLastError() ); + buffer = malloc( size ); + ok( !!buffer, "malloc failed\n" ); + ret = InitializeContext( buffer, CONTEXT_ALL | CONTEXT_FLOATING_POINT | CONTEXT_XSTATE, &context, &size ); + ok( ret, "InitializeContext failed, error %lu\n", GetLastError() ); + ret = SetXStateFeaturesMask( context, XSTATE_MASK_GSSE ); + ok( ret, "SetXStateFeaturesMask failed, error %lu\n", GetLastError() ); + xmm = (M128A *)LocateXStateFeature( context, XSTATE_LEGACY_SSE, NULL ); + ok( !!xmm, "LocateXStateFeature XSTATE_LEGACY_SSE failed, error %lu\n", GetLastError() ); + ymm = (M128A *)LocateXStateFeature( context, XSTATE_AVX, NULL ); + ok( !!ymm, "LocateXStateFeature XSTATE_AVX failed, error %lu\n", GetLastError() ); + + _fxsave( initial ); + + /* NtGetContextThread misses volatile FPU registers */ + + _fxsave( fpu_buf ); + fpu_buf[20] = 1; + fpu_buf[21] = 2; + fpu_buf[32] = 3; + fpu_buf[33] = 4; + _fxrstor( fpu_buf ); + fpu_buf[20] = 0; + fpu_buf[21] = 0; + fpu_buf[32] = 0; + fpu_buf[33] = 0; + _fxsave( fpu_buf ); + + context->ContextFlags = CONTEXT_FLOATING_POINT | CONTEXT_XSTATE; + NtGetContextThread( GetCurrentThread(), context ); + + _fxrstor( initial ); + + ok( fpu_buf[20] == 1, "got xmm0:lo %#I64x\n", fpu_buf[20] ); + ok( fpu_buf[21] == 2, "got xmm0:hi %#I64x\n", fpu_buf[21] ); + ok( fpu_buf[32] == 3, "got xmm6:lo %#I64x\n", fpu_buf[32] ); + ok( fpu_buf[33] == 4, "got xmm6:hi %#I64x\n", fpu_buf[33] ); + + todo_wine_if( sizeof(void *) == 8 ) + ok( xmm[0].Low == 0, "got context xmm0:lo %#I64x\n", xmm[0].Low ); + todo_wine_if( sizeof(void *) == 8 ) + ok( xmm[0].High == 0, "got context xmm0:hi %#I64x\n", xmm[0].High ); +#ifdef _WIN64 + ok( xmm[6].Low == 3, "got context xmm6:lo %#I64x\n", xmm[6].Low ); + ok( xmm[6].High == 4, "got context xmm6:hi %#I64x\n", xmm[6].High ); +#else + ok( xmm[6].Low == 0, "got context xmm6:lo %#I64x\n", xmm[6].Low ); + ok( xmm[6].High == 0, "got context xmm6:hi %#I64x\n", xmm[6].High ); +#endif + ok( ymm[0].Low == 0, "got context ymm0:lo %#I64x\n", ymm[0].Low ); + ok( ymm[0].High == 0, "got context ymm0:hi %#I64x\n", ymm[0].High ); + ok( ymm[6].Low == 0, "got context ymm6:lo %#I64x\n", ymm[6].Low ); + ok( ymm[6].High == 0, "got context ymm6:hi %#I64x\n", ymm[6].High ); + + /* NtGetContextThread returns volatile FPU registers from the previous NtSetContextThread call */ + + xmm[0].Low = 5; + xmm[0].High = 6; + xmm[6].Low = 7; + xmm[6].High = 8; + ymm[0].Low = 1000; + ymm[0].High = 1001; + ymm[6].Low = 1002; + ymm[6].High = 1003; + context->ContextFlags = CONTEXT_FLOATING_POINT | CONTEXT_XSTATE; + NtSetContextThread( GetCurrentThread(), context ); + + _fxsave( fpu_buf ); + fpu_buf[20] = 9; + fpu_buf[21] = 10; + _fxrstor( fpu_buf ); + fpu_buf[20] = 0; + fpu_buf[21] = 0; + _fxsave( fpu_buf ); + + context->ContextFlags = CONTEXT_FLOATING_POINT | CONTEXT_XSTATE; + NtGetContextThread( GetCurrentThread(), context ); + + _fxrstor( initial ); + + ok( fpu_buf[20] == 9, "got xmm0:lo %#I64x\n", fpu_buf[20] ); + ok( fpu_buf[21] == 10, "got xmm0:hi %#I64x\n", fpu_buf[21] ); +#ifdef _WIN64 + ok( fpu_buf[32] == 7, "got xmm6:lo %#I64x\n", fpu_buf[32] ); + ok( fpu_buf[33] == 8, "got xmm6:hi %#I64x\n", fpu_buf[33] ); +#else + todo_wine_if( sizeof(void *) == 4 ) + ok( fpu_buf[32] == 0, "got xmm6:lo %#I64x\n", fpu_buf[32] ); + todo_wine_if( sizeof(void *) == 4 ) + ok( fpu_buf[33] == 0, "got xmm6:hi %#I64x\n", fpu_buf[33] ); +#endif + + todo_wine_if( sizeof(void *) == 8 ) + ok( xmm[0].Low == 5, "got context xmm0:lo %#I64x\n", xmm[0].Low ); + todo_wine_if( sizeof(void *) == 8 ) + ok( xmm[0].High == 6, "got context xmm0:hi %#I64x\n", xmm[0].High ); + ok( xmm[6].Low == 7, "got context xmm6:lo %#I64x\n", xmm[6].Low ); + ok( xmm[6].High == 8, "got context xmm6:hi %#I64x\n", xmm[6].High ); + ok( ymm[0].Low == 1000, "got context ymm0:lo %#I64x\n", ymm[0].Low ); + ok( ymm[0].High == 1001, "got context ymm0:hi %#I64x\n", ymm[0].High ); + ok( ymm[6].Low == 1002, "got context ymm6:lo %#I64x\n", ymm[6].Low ); + ok( ymm[6].High == 1003, "got context ymm6:hi %#I64x\n", ymm[6].High ); + + /* check reading context from the main thread while in user space */ + + _fxsave( fpu_buf ); + fpu_buf[20] = 11; + fpu_buf[21] = 12; + fpu_buf[32] = 13; + fpu_buf[33] = 14; + _fxrstor( fpu_buf ); + + InterlockedIncrement( &test_context_signal ); + while (InterlockedOr( &test_context_signal, 0 )) YieldProcessor(); + + /* check reading context from the main thread while in syscall */ + + _fxsave( fpu_buf ); + fpu_buf[20] = 15; + fpu_buf[21] = 16; + fpu_buf[32] = 17; + fpu_buf[33] = 18; + _fxrstor( fpu_buf ); + + InterlockedIncrement( &test_context_signal ); + NtDelayExecution( TRUE, &timeout ); + + /* NtGetContextThread returns volatile FPU registers from the previous NtSetContextThread call */ + + context->ContextFlags = CONTEXT_FLOATING_POINT | CONTEXT_XSTATE; + NtGetContextThread( GetCurrentThread(), context ); + + _fxrstor( initial ); + +#ifdef _WIN64 + todo_wine + ok( xmm[0].Low == 11, "got context xmm0:lo %#I64x\n", xmm[0].Low ); + todo_wine + ok( xmm[0].High == 12, "got context xmm0:hi %#I64x\n", xmm[0].High ); + ok( xmm[6].Low == 17, "got context xmm6:lo %#I64x\n", xmm[6].Low ); + ok( xmm[6].High == 18, "got context xmm6:hi %#I64x\n", xmm[6].High ); +#else + ok( xmm[0].Low == 5, "got context xmm0:lo %#I64x\n", xmm[0].Low ); + ok( xmm[0].High == 6, "got context xmm0:hi %#I64x\n", xmm[0].High ); + ok( xmm[6].Low == 7, "got context xmm6:lo %#I64x\n", xmm[6].Low ); + ok( xmm[6].High == 8, "got context xmm6:hi %#I64x\n", xmm[6].High ); +#endif + ok( ymm[0].Low == 1000, "got context ymm0:lo %#I64x\n", ymm[0].Low ); + ok( ymm[0].High == 1001, "got context ymm0:hi %#I64x\n", ymm[0].High ); + ok( ymm[6].Low == 1002, "got context ymm6:lo %#I64x\n", ymm[6].Low ); + ok( ymm[6].High == 1003, "got context ymm6:hi %#I64x\n", ymm[6].High ); + + free( buffer ); + return 0; +} + +static void test_NtGetThreadContext(void) +{ + HANDLE thread = CreateThread( NULL, 0, test_context_thread, 0, 0, NULL ); + CONTEXT *context; + M128A *xmm, *ymm; + void *buffer; + DWORD size; + BOOL ret; + + ret = InitializeContext( NULL, CONTEXT_ALL | CONTEXT_FLOATING_POINT | CONTEXT_XSTATE, NULL, &size ); + ok( !ret && GetLastError() == ERROR_INSUFFICIENT_BUFFER, "InitializeContext failed, error %lu\n", GetLastError() ); + buffer = malloc( size ); + ok( !!buffer, "malloc failed\n" ); + ret = InitializeContext( buffer, CONTEXT_ALL | CONTEXT_FLOATING_POINT | CONTEXT_XSTATE, &context, &size ); + ok( ret, "InitializeContext failed, error %lu\n", GetLastError() ); + ret = SetXStateFeaturesMask( context, XSTATE_MASK_GSSE ); + ok( ret, "SetXStateFeaturesMask failed, error %lu\n", GetLastError() ); + xmm = (M128A *)LocateXStateFeature( context, XSTATE_LEGACY_SSE, NULL ); + ok( !!xmm, "LocateXStateFeature XSTATE_LEGACY_SSE failed, error %lu\n", GetLastError() ); + ymm = (M128A *)LocateXStateFeature( context, XSTATE_AVX, NULL ); + ok( !!ymm, "LocateXStateFeature XSTATE_AVX failed, error %lu\n", GetLastError() ); + + while (!InterlockedOr( &test_context_signal, 0 )) YieldProcessor(); + + /* NtGetContextThread context while it's in user space captures volatile FPU registers */ + + context->ContextFlags = CONTEXT_FLOATING_POINT | CONTEXT_XSTATE; + NtGetContextThread( thread, context ); +#ifdef _WIN64 + ok( xmm[0].Low == 11, "got context xmm0:lo %#I64x\n", xmm[0].Low ); + ok( xmm[0].High == 12, "got context xmm0:hi %#I64x\n", xmm[0].High ); + ok( xmm[6].Low == 13, "got context xmm6:lo %#I64x\n", xmm[6].Low ); + ok( xmm[6].High == 14, "got context xmm6:hi %#I64x\n", xmm[6].High ); +#else + ok( xmm[0].Low == 0, "got context xmm0:lo %#I64x\n", xmm[0].Low ); + ok( xmm[0].High == 0, "got context xmm0:hi %#I64x\n", xmm[0].High ); + ok( xmm[6].Low == 0, "got context xmm6:lo %#I64x\n", xmm[6].Low ); + ok( xmm[6].High == 0, "got context xmm6:hi %#I64x\n", xmm[6].High ); +#endif + ok( ymm[0].Low == 0, "got context ymm0:lo %#I64x\n", ymm[0].Low ); + ok( ymm[0].High == 0, "got context ymm0:hi %#I64x\n", ymm[0].High ); + ok( ymm[6].Low == 0, "got context ymm6:lo %#I64x\n", ymm[6].Low ); + ok( ymm[6].High == 0, "got context ymm6:hi %#I64x\n", ymm[6].High ); + + InterlockedDecrement( &test_context_signal ); + while (!InterlockedOr( &test_context_signal, 0 )) YieldProcessor(); + Sleep( 10 ); /* leave some time for the thread to enter the syscall */ + + /* NtGetContextThread context while in a syscall returns outdated volatile FPU registers */ + + context->ContextFlags = CONTEXT_FLOATING_POINT | CONTEXT_XSTATE; + NtGetContextThread( thread, context ); +#ifdef _WIN64 + todo_wine + ok( xmm[0].Low == 11, "got context xmm0:lo %#I64x\n", xmm[0].Low ); + todo_wine + ok( xmm[0].High == 12, "got context xmm0:hi %#I64x\n", xmm[0].High ); + ok( xmm[6].Low == 17, "got context xmm6:lo %#I64x\n", xmm[6].Low ); + ok( xmm[6].High == 18, "got context xmm6:hi %#I64x\n", xmm[6].High ); +#else + ok( xmm[0].Low == 0, "got context xmm0:lo %#I64x\n", xmm[0].Low ); + ok( xmm[0].High == 0, "got context xmm0:hi %#I64x\n", xmm[0].High ); + ok( xmm[6].Low == 0, "got context xmm6:lo %#I64x\n", xmm[6].Low ); + ok( xmm[6].High == 0, "got context xmm6:hi %#I64x\n", xmm[6].High ); +#endif + ok( ymm[0].Low == 0, "got context ymm0:lo %#I64x\n", ymm[0].Low ); + ok( ymm[0].High == 0, "got context ymm0:hi %#I64x\n", ymm[0].High ); + ok( ymm[6].Low == 0, "got context ymm6:lo %#I64x\n", ymm[6].Low ); + ok( ymm[6].High == 0, "got context ymm6:hi %#I64x\n", ymm[6].High ); + + WaitForSingleObject( thread, 1000 ); + CloseHandle( thread ); + + free( buffer ); +} +#endif /* defined(__i386__) || defined(__x86_64__) */ + START_TEST(thread) { init_function_pointers();
test_dbg_hidden_thread_creation(); test_unique_teb(); +#if defined(__i386__) || defined(__x86_64__) + test_NtGetThreadContext(); +#endif /* defined(__i386__) || defined(__x86_64__) */ }
Hi,
It looks like your patch introduced the new failures shown below. Please investigate and fix them before resubmitting your patch. If they are not new, fixing them anyway would help a lot. Otherwise please ask for the known failures list to be updated.
The full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=125718
Your paranoid android.
=== build (build log) ===
/usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to always_inline ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/6.3-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to always_inline ���_fxrstor���: target specific option mismatch Makefile:120782: recipe for target 'dlls/ntdll/tests/i386-windows/thread.o' failed Task: The exe32 Wine build failed
=== debian11 (build log) ===
/usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch Task: The win32 Wine build failed
=== debian11b (build log) ===
/usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:39:1: error: inlining failed in call to ���always_inline��� ���_fxsave���: target specific option mismatch /usr/lib/gcc/i686-w64-mingw32/10-win32/include/fxsrintrin.h:46:1: error: inlining failed in call to ���always_inline��� ���_fxrstor���: target specific option mismatch Task: The wow32 Wine build failed
Also, the syscalls are ms_abi, so they already save the non-volatile XMM registers whenever they call some sysv ABI functions. This means that currently we essentially save them twice on every syscall -- in addition to the volatile state, which is the biggest problem even though it's only saved once.
Some tests fail for me in VirtualBox running on AMD CPU: ``` thread.c:230: Test failed: got xmm0:lo 0 thread.c:231: Test failed: got xmm0:hi 0 thread.c:277: Test failed: got xmm0:lo 0 thread.c:278: Test failed: got xmm0:hi 0 ```
But yes, if we can skip full context store, it would be nice. I've been thinking about skipping it for `__wine_unix_call` syscall, but skipping it for more syscalls would be even nicer.
I don't remember details, but full context store was needed to pass existing ntdll AVX tests. It's possible that they depend on triggering some 'slow' code path one way or another, I guess we will find out when we try to implement this.
I think that the plan is to stop using ms_abi for syscalls and depend on syscall dispatcher to deal with ms_abi->sysv conversion, but for that we need to get rid of remaining direct calls first.
On 11/4/22 05:25, Jacek Caban (@jacek) wrote:
Some tests fail for me in VirtualBox running on AMD CPU:
thread.c:230: Test failed: got xmm0:lo 0 thread.c:231: Test failed: got xmm0:hi 0 thread.c:277: Test failed: got xmm0:lo 0 thread.c:278: Test failed: got xmm0:hi 0
But yes, if we can skip full context store, it would be nice. I've been thinking about skipping it for `__wine_unix_call` syscall, but skipping it for more syscalls would be even nicer.
I don't remember details, but full context store was needed to pass existing ntdll AVX tests. It's possible that they depend on triggering some 'slow' code path one way or another, I guess we will find out when we try to implement this.
I think that the plan is to stop using ms_abi for syscalls and depend on syscall dispatcher to deal with ms_abi->sysv conversion, but for that we need to get rid of remaining direct calls first.
I suspect that the majoirity of overhead of xsavec is not in volatile XMM registers save. Most of the time they are in init state (that is, all zero) and nothing is actually saved, but xsavec saves more than that. IMO the only feasible way to solve this issue is to have a lighter wine_unix_call which will skip any FPU state save restore and possibly something else, with delayed processing of NtGetContextThread (should that happen during that wine_unix_call), so NtGetContextThread still works correctly and doesn't break DRMs and debuggers.
On 11/4/22 10:25, Paul Gofman wrote:
On 11/4/22 05:25, Jacek Caban (@jacek) wrote:
Some tests fail for me in VirtualBox running on AMD CPU:
thread.c:230: Test failed: got xmm0:lo 0 thread.c:231: Test failed: got xmm0:hi 0 thread.c:277: Test failed: got xmm0:lo 0 thread.c:278: Test failed: got xmm0:hi 0
But yes, if we can skip full context store, it would be nice. I've been thinking about skipping it for `__wine_unix_call` syscall, but skipping it for more syscalls would be even nicer.
I don't remember details, but full context store was needed to pass existing ntdll AVX tests. It's possible that they depend on triggering some 'slow' code path one way or another, I guess we will find out when we try to implement this.
I think that the plan is to stop using ms_abi for syscalls and depend on syscall dispatcher to deal with ms_abi->sysv conversion, but for that we need to get rid of remaining direct calls first.
I suspect that the majoirity of overhead of xsavec is not in volatile XMM registers save. Most of the time they are in init state (that is, all zero) and nothing is actually saved, but xsavec saves more than that. IMO the only feasible way to solve this issue is to have a lighter wine_unix_call which will skip any FPU state save restore and possibly something else, with delayed processing of NtGetContextThread (should that happen during that wine_unix_call), so NtGetContextThread still works correctly and doesn't break DRMs and debuggers.
regarding init state, I mean upper half ymm of course, xmm are saved but that is going to happen anyway at some point of abi change and saving half of those was generated by the compiler already as separate instructions for each register in every WINAPI -> SYSV call.
Some tests fail for me in VirtualBox running on AMD CPU:
Hmm... weird. The tests probably need some work, but I think there's some clues that it should be possible.
But yes, if we can skip full context store, it would be nice. I've been thinking about skipping it for `__wine_unix_call` syscall, but skipping it for more syscalls would be even nicer.
Yes, we could have a special and lighter dispatcher for `__wine_unix_call`, eventually with an option to make it a full dispatcher if need be. I have found already some games with bad performance hit with the GL conversion.
---
FWIW regarding the dispatcher overhead, I have noted a few things in addition to the the FPU state that could be nice to keep in mind for a lighter dispatcher:
If we save the FPU state partially, the next hurting things would be xsave.MxCsr / xsave.ControlWord, I'm not sure if we need to save those; and I'm having trouble with them for some reason.
Then next overhead comes from the save and restore of rflags. As far as I could see syscalls are not keeping all the flags untouched (obviously, as they still do a few comparisons), but some (NT, ID, DF) seem to be saved and restored by NtDelayExecution. I'm sure at least NT flag was causing some issues with some applications.
It's not much overhead but I think pushf disrupts the CPU pipeline and skipping those could be nice. I have some ideas to take a few shortcuts and avoid popf, but I don't see how to avoid the pushf to read the flags. If we can be sure nothing will rely on them for `__wine_unix_call` maybe we can simply zero the flags.
The 32-bit dispatcher also suffers from `rep movsl`, copying a fixed number of arguments with `pushl` instead and falling back to `rep movsl` when there is more seem to make a good difference.
Then `__wine_unix_call` as a function also has a bit of overhead as it saves frame pointer (and currently XMM registers), where it could just be `movq %r8,%rdi; jmp *(%rcx,%rdx,8)`.
Last I think, on the PE side, having the `__wine_unix_call` import in `winecrt0` also hurts a bit with some unnecessary branches and indirection. I have opened https://gitlab.winehq.org/wine/wine/-/merge_requests/1201 for that.
This merge request was closed by Rémi Bernon.