If after a unix call `frame->restore_flags` was not 0 but did not include either `CONTEXT_FLOATING_POINT` or `CONTEXT_XSTATE`, xmm6-xmm15 were not restored to their previous values.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1772
This removes 20 `movaps` instructions from every syscall that calls a sysv_abi function, plus an `and` for stack alignment and some other instructions depending on the function.
In `NtAllocateLocallyUniqueId` for example this reduces the number of instructions from 63 to 36.
I don't entirely understand the llvm-mca output but here are the before and after stats that it outputs for that function:
Before
Iterations: 100
Instructions: 6300
Total Cycles: 3335
Total uOps: 6300
Dispatch Width: 6
uOps Per Cycle: 1.89
IPC: 1.89
Block RThroughput: 15.0
After
Iterations: 100
Instructions: 3600
Total Cycles: 1514
Total uOps: 3600
Dispatch Width: 6
uOps Per Cycle: 2.38
IPC: 2.38
Block RThroughput: 6.0
This currently depends on the stack being aligned by the syscall dispatcher, which afaict is the case if `sizeof(struct syscall_frame) % 16 == 0`. If that is not good enough I can add an `andq $~15,%rsp` somewhere.
One question I have is whether we want to continue supporting CDECL syscalls (only `wine_server_call`, `wine_server_fd_to_handle` and `wine_server_handle_to_fd`)?
If we do, this adds a bit of complexity to the syscall dispatcher, see the commit "FIXUP ntdll: Support CDECL syscalls."
If we don't, and make those syscalls WINAPI instead, then for every call to those functions on x86 it seems to either change nothing or add one `add` instruction. However we of course lose the ability to make CDECL syscalls.
--
v2: Revert "ntdll: Make CDECL syscalls WINAPI instead."
FIXUP ntdll: Support CDECL syscalls.
ntdll: Make syscall functions sysv_abi on x64.
ntdll: Make CDECL syscalls WINAPI instead.
win32u: Make syscalls use the SYSCALL calling convention.
ntdll: Make syscalls use the SYSCALL calling convention.
include: Add SYSCALL calling convention.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1752
When .so module initialization was moved from ntdll to winecrt0 with
commit bef09697227c29f53bb0ad95232399cbba5c9c6b we lost a number of
include files.
This broke FreeBSD-specific code that used BOOL, TRUE, and FALSE.
Fix that by using poor man's int, 1, and 0 instead.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1769
When .so module initialization was moved from ntdll to winecrt0 with
commit bef09697227c29f53bb0ad95232399cbba5c9c6b we lost a number of
include files.
This broke FreeBSD-specific code that used BOOL, TRUE, and FALSE.
Fix that by using poor man's int, 1, and 0 instead.
--
This merge request has too many patches to be relayed via email.
Please visit the URL below to see the contents of the merge request.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1767
If a hlsl_ir_load loads a variable whose components are stored from different
instructions, copy propagation doesn't replace it.
But if all these instructions are constants (which currently is the case
for value constructors), the load can be replaced with a constant value, which
is what the first patch of this series does.
For instance, this shader:
```
sampler s;
Texture2D t;
float4 main() : sv_target
{
return t.Gather(s, float2(0.6, 0.6), int2(0, 0));
}
```
results in the following IR before applying the patch:
```
float | 6.00000024e-01
float | 6.00000024e-01
uint | 0
| = (<constructor-2>[@4].x @2)
uint | 1
| = (<constructor-2>[@6].x @3)
float2 | <constructor-2>
int | 0
int | 0
uint | 0
| = (<constructor-5>[@11].x @9)
uint | 1
| = (<constructor-5>[@13].x @10)
int2 | <constructor-5>
float4 | gather_red(resource = t, sampler = s, coords = @8, offset = @15)
| return
| = (<output-sv_target0> @16)
```
and this IR afterwards:
```
float2 | {6.00000024e-01 6.00000024e-01 }
int2 | {0 0 }
float4 | gather_red(resource = t, sampler = s, coords = @2, offset = @3)
| return
| = (<output-sv_target0> @4)
```
This is required to write texel_offsets as aoffimmi modifiers in the sm4 backend, since it expects the texel_offset arguments to be hlsl_ir_constant.
This series also:
* Allows Gather() methods to use aoffimmi modifiers instead of an additional source register (which is the only way allowed for shader model 4.1), when possible.
* Adds support to texel_offsets in the Load() method via aoffimmi modifiers (the only allowed method).
--
v3: vkd3d-shader/hlsl: Propagate swizzle chains in copy propagation.
vkd3d-shader/hlsl: Replace swizzles with constants in copy prop.
tests: Test constant propagation through swizzles.
vkd3d-shader/hlsl: Support offset argument for the texture Load() method.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/51