I believe this performs similarly, if not better, than how direct calls previously performed. It is possible to make things even better with tail calls on the PE side, but it is going to be a little bit harder, and I'll make another MR later.
FWIW except for vkoverhead benchmark, I still yet to see a real case scenario where it makes a difference, though I think this is straightforward enough.
CC @mbriar
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1763
Follow GCC recommendations for getting rid of it.
MingW/GCC 12 complains with:
/home/eric/work/wine/dlls/d3dx9_36/tests/effect.c: In function 'test_effect_parameter_value':
/home/eric/work/wine/dlls/d3dx9_36/tests/effect.c:1838:71: warning: expression does not compute the number of elements in this array; element type is 'DWORD' {aka 'long unsigned int'}, not 'D3DXMATRIX' {aka 'struct _D3DMATRIX'} [-Wsizeof-array-div]
1838 | const D3DXMATRIX *matrix_pointer_array[sizeof(input_value)/sizeof(D3DXMATRIX)];
| ^
/home/eric/work/wine/dlls/d3dx9_36/tests/effect.c:1838:71: note: add parentheses around the second 'sizeof' to silence this warning
/home/eric/work/wine/dlls/d3dx9_36/tests/effect.c:1835:19: note: array 'input_value' declared here
1835 | DWORD input_value[EFFECT_PARAMETER_VALUE_ARRAY_SIZE];
| ^~~~~~~~~~~
Signed-off-by: Eric Pouech <eric.pouech(a)gmail.com>
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1766
Valgrind support requires a fork, which I've published to https://gitlab.winehq.org/rbernon/valgrind. The fork implements loading DWARF debug info from PE files, instead of the old and broken upstream PDB support. I've tried to upstream these changes a long time ago but didn't receive any feedback.
I think we could maybe consider keeping a fork, which I'm happy to maintain, as the changes aren't too large. We may want to investigate adding 32-on-64 support, which may require a bit more changes (to VEX specifically, because its amd64 guest doesn't support segment register manipulation).
The changes here are not all related to Valgrind, and I'll create separate MR for those which may make sense independently from Valgrind / GDB.
Also included is a suppression file to silent some annoying false positives, many of which are coming from the cross-stack accesses during syscalls, which are confusing Valgrind's stack heuristics. One can try this out with something like:
`WINELOADERNOEXEC=1 valgrind --suppressions=tools/valgrind.supp wine64/loader/wine64 wine64/programs/winecfg/winecfg.exe`
--
v5: ntdll: Avoid writing to invalid memory in i386 unix dispatcher.
ntdll: Set %rsp before args in x86_64 call_user_mode_callback.
ntdll: Fix incorrect i386 call_user_mode_callback CFI.
ntdll: Fix valgrind notifications from ntdll.so.
ntdll: Import valgrind headers for PE side ntdll.
ntdll: Allocate a truly separate stack for the kernel stack.
ntdll: Maintain a PE module link map and expose it to GDB.
ntdll: Pass a UNICODE_STRING to load_builtin and virtual_map_image.
loader: Expose a shadow copy of ld.so link map to GDB.
ntdll: Add .cfi_signal_frame to __wine_syscall_dispatcher.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1074
Valgrind support requires a fork, which I've published to https://gitlab.winehq.org/rbernon/valgrind. The fork implements loading DWARF debug info from PE files, instead of the old and broken upstream PDB support. I've tried to upstream these changes a long time ago but didn't receive any feedback.
I think we could maybe consider keeping a fork, which I'm happy to maintain, as the changes aren't too large. We may want to investigate adding 32-on-64 support, which may require a bit more changes (to VEX specifically, because its amd64 guest doesn't support segment register manipulation).
The changes here are not all related to Valgrind, and I'll create separate MR for those which may make sense independently from Valgrind / GDB.
Also included is a suppression file to silent some annoying false positives, many of which are coming from the cross-stack accesses during syscalls, which are confusing Valgrind's stack heuristics. One can try this out with something like:
`WINELOADERNOEXEC=1 valgrind --suppressions=tools/valgrind.supp wine64/loader/wine64 wine64/programs/winecfg/winecfg.exe`
--
v4: ntdll: Avoid writing to invalid memory in i386 unix dispatcher.
ntdll: Set %rsp before args in x86_64 call_user_mode_callback.
ntdll: Fix incorrect i386 call_user_mode_callback CFI.
ntdll: Fix valgrind notifications from ntdll.so.
ntdll: Import valgrind headers for PE side ntdll.
ntdll: Allocate a truly separate stack for the kernel stack.
ntdll: Maintain a PE module link map and expose it to GDB.
ntdll: Pass a UNICODE_STRING to load_builtin and virtual_map_image.
loader: Expose a shadow copy of ld.so link map to GDB.
ntdll: Add .cfi_signal_frame to __wine_syscall_dispatcher.
configure: Quiet recheck and Werror.
d3dx9_36/tests: Fix an array size warning.
ntdll: Initialize unix_pid and unix_tid.
rpcrt4: Fix partial allocation warnings.
winegstreamer: Use QWORD instead of uint64_t for length.
shell32: Avoid undefined behavior with partially allocated structs.
user32: Avoid undefined behavior with partially allocated structs.
jscript: Use flexible array member to avoid a warning.
devenum: Avoid undefined behavior with uninitialized value.
webservices: Explicitly cast WS_XML_READER_ENCODING_TYPE enum.
msvcp60: Use an explicit type to avoid a warning.
krnl386.exe16: Avoid undefined behavior.
krnl386.exe16: Avoid passing NULL to lstrcpyA.
winedbg: Avoid undefined behavior with printing NULL string.
winex11: Avoid undefined behavior with uninitialized value.
scrobj: Avoid depending on uninitialized hres value.
gdiplus: Initialize ofs.
ole32/tests: Avoid fixed pointer usage.
ntdll: Avoid use of uninitialized ch.
kernelbase: Avoid returning uninitialized HMODULE.
comctl32: Always initialize dtFlags.
mciseq: Print data only when successfully read.
dwrite/tests: Silent a todo_wine identation warning.
kernelbase/tests: Initialize input buffer to silent a warning.
ntoskrnl.exe: Make USD pointers volatile to silent a warning.
user.exe16: Initialize RECT to silent a warning.
shlwapi/tests: Initialize input buffer to silent a warning.
ddraw: Use designated initializer to avoid implicit enum cast.
d3d8: Use designated initializer to avoid implicit enum cast.
d3d11: Avoid implicit enum value cast.
d2d1: Avoid implicit D2D1_FACTORY_TYPE enum cast.
d2d1: Remove unnecessary cast.
include: Support running all tests at once with --all.
gitlab: Avoid unnecessary winetest re-link on incremental builds.
configure.ac: Check for git presence and use $(GIT) variable.
configure.ac: Use ; rather than && to chain variable and comparison.
Revert "gitlab: Run lengthy tests in a separate job."
Revert "gitlab: Run short tests on 32-bit too."
WIP: build: Prefer local gitlab runner.
gitlab: Run lengthy tests in a separate job.
gitlab: Add a job to run some tests as 64-bit.
gitlab: Run related tests on every commit.
gitlab: Run some tests with nulldrv driver.
WIP: build: Add a github actions workflow.
This merge request has too many patches to be relayed via email.
Please visit the URL below to see the contents of the merge request.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1074
In this patch set, GetOutputType() currently fails for NV11 and the following RGB types because MFCalculateImageSize() fails for them. I'll fix MFCalculateImageSize() for them recently (but not in this patch set).
--
v3: winegstreamer: Implement GetOutputType for WMV decoder.
mf/tests: Test info headers returned by GetOutputType for WMV decoder.
mf/tests: Test GetOutputType for WMV decoder.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1707
If after a unix call `frame->restore_flags` was not 0 but did not include either `CONTEXT_FLOATING_POINT` or `CONTEXT_XSTATE`, xmm6-xmm15 were not restored to their previous values.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1772
This removes 20 `movaps` instructions from every syscall that calls a sysv_abi function, plus an `and` for stack alignment and some other instructions depending on the function.
In `NtAllocateLocallyUniqueId` for example this reduces the number of instructions from 63 to 36.
I don't entirely understand the llvm-mca output but here are the before and after stats that it outputs for that function:
Before
Iterations: 100
Instructions: 6300
Total Cycles: 3335
Total uOps: 6300
Dispatch Width: 6
uOps Per Cycle: 1.89
IPC: 1.89
Block RThroughput: 15.0
After
Iterations: 100
Instructions: 3600
Total Cycles: 1514
Total uOps: 3600
Dispatch Width: 6
uOps Per Cycle: 2.38
IPC: 2.38
Block RThroughput: 6.0
This currently depends on the stack being aligned by the syscall dispatcher, which afaict is the case if `sizeof(struct syscall_frame) % 16 == 0`. If that is not good enough I can add an `andq $~15,%rsp` somewhere.
One question I have is whether we want to continue supporting CDECL syscalls (only `wine_server_call`, `wine_server_fd_to_handle` and `wine_server_handle_to_fd`)?
If we do, this adds a bit of complexity to the syscall dispatcher, see the commit "FIXUP ntdll: Support CDECL syscalls."
If we don't, and make those syscalls WINAPI instead, then for every call to those functions on x86 it seems to either change nothing or add one `add` instruction. However we of course lose the ability to make CDECL syscalls.
--
v2: Revert "ntdll: Make CDECL syscalls WINAPI instead."
FIXUP ntdll: Support CDECL syscalls.
ntdll: Make syscall functions sysv_abi on x64.
ntdll: Make CDECL syscalls WINAPI instead.
win32u: Make syscalls use the SYSCALL calling convention.
ntdll: Make syscalls use the SYSCALL calling convention.
include: Add SYSCALL calling convention.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1752