This removes 20 `movaps` instructions from every syscall that calls a sysv_abi function, plus an `and` for stack alignment and some other instructions depending on the function.
In `NtAllocateLocallyUniqueId` for example this reduces the number of instructions from 63 to 36.
I don't entirely understand the llvm-mca output but here are the before and after stats that it outputs for that function:
Before
Iterations: 100
Instructions: 6300
Total Cycles: 3335
…
[View More]Total uOps: 6300
Dispatch Width: 6
uOps Per Cycle: 1.89
IPC: 1.89
Block RThroughput: 15.0
After
Iterations: 100
Instructions: 3600
Total Cycles: 1514
Total uOps: 3600
Dispatch Width: 6
uOps Per Cycle: 2.38
IPC: 2.38
Block RThroughput: 6.0
This currently depends on the stack being aligned by the syscall dispatcher, which afaict is the case if `sizeof(struct syscall_frame) % 16 == 0`. If that is not good enough I can add an `andq $~15,%rsp` somewhere.
One question I have is whether we want to continue supporting CDECL syscalls (only `wine_server_call`, `wine_server_fd_to_handle` and `wine_server_handle_to_fd`)?
If we do, this adds a bit of complexity to the syscall dispatcher, see the commit "FIXUP ntdll: Support CDECL syscalls."
If we don't, and make those syscalls WINAPI instead, then for every call to those functions on x86 it seems to either change nothing or add one `add` instruction. However we of course lose the ability to make CDECL syscalls.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1752
[View Less]
--
v2: wined3d: Enable long types in texture.c.
wined3d: Reduce usage of long integral types in texture.c.
wined3d: Change stencil parameter type in blitter_clear() method.
wined3d: Get/set texture's level_count and lod as unsigned int.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1727
We are currently not initializing static values to zero by default.
Consider the following shader:
```hlsl
static float4 va;
float4 main() : sv_target
{
return va;
}
```
we get the following output:
```
ps_5_0
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, r1.xyzw
mov o0.xyzw, r0.xyzw
ret
```
where r1.xyzw is not initialized.
This patch solves this by assigning the static variable the value of an
uint 0, and thus, relying on complex broadcasts.
This seems to be the behaviour of the the …
[View More]native compiler, since it retrieves
the following error on a shader that lacks an initializer on a data type with
object components:
```
error X3017: cannot convert from 'uint' to 'struct <unnamed>'
```
--
v4: vkd3d-shader/hlsl: Allow uninitialized static objects.
vkd3d-shader/hlsl: Validate that non-uniform objects are not referenced.
tests: Add additional object references tests.
tests: Test proper initialization of static structs to zero.
vkd3d-shader/hlsl: Initialize static variables to 0 by default.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/54
[View Less]
Implement a basic GC based on the mark-and-sweep algorithm, without requiring manually specifying "roots", which vastly simplifies the management. For now, it is triggered every 30 seconds since it last finished, on a new object initialization. Better heuristics could be used in the future.
The comments in the code should hopefully understand the high level logic of this approach without boilerplate details. I've tested it on FFXIV launcher (along with other patches from Proton to have it work)…
[View More] and it stops the massive memory leak successfully by itself, so at least it does its job properly. The last patch in the MR is just an optimization for a *very* common case.
For artificial testing, one could use something like:
```javascript
function leak() {
var a = {}, b = {};
a.b = b;
b.a = a;
}
```
which creates a circular ref and will leak when the function returns.
It also introduces and makes use of a "heap_stack", which prevents stack overflows on long chains.
--
v6: jscript: Create the source function's 'prototype' prop object on demand.
jscript: Run the garbage collector every 30 seconds on a new object
jscript: Implement CollectGarbage().
jscript: Implement a Garbage Collector to deal with circular references.
jscript: Use a jsdisp to hold refs for scopes.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1635
[View Less]
We are currently not initializing static values to zero by default.
Consider the following shader:
```hlsl
static float4 va;
float4 main() : sv_target
{
return va;
}
```
we get the following output:
```
ps_5_0
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, r1.xyzw
mov o0.xyzw, r0.xyzw
ret
```
where r1.xyzw is not initialized.
This patch solves this by assigning the static variable the value of an
uint 0, and thus, relying on complex broadcasts.
This seems to be the behaviour of the the …
[View More]native compiler, since it retrieves
the following error on a shader that lacks an initializer on a data type with
object components:
```
error X3017: cannot convert from 'uint' to 'struct <unnamed>'
```
--
v3: vkd3d-shader/hlsl: Allow uninitialized static objects.
vkd3d-shader/hlsl: Validate that non-uniform objects are not referenced.
tests: Add additional object references tests.
vkd3d-shader/hlsl: Initialize static variables to 0 by default.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/54
[View Less]
This implements the _RunAndWait member function and the constructor of the _StructuredTaskCollection, which enables the tests for that class to run.
Also adds a throw of a missing_wait exception to the destructor when chores are scheduled and _RunAndWait was not called.
Remaining stuff after this:
- Task collection cancelling + IsCancelling function (next MR)
- Cancellation token support (requires RE'ing the _CancellationTokenState class, currently in progress)
--
v4: msvcr120: Throw …
[View More]exception in ~_StructuredTaskCollection if _RunAndWait was not called.
msvcr100: Implement missing_wait exception.
msvcr110: Implement _StructuredTaskCollection constructor.
msvcr100: Implement exception passing from chore threads to _RunAndWait.
msvcr100: Factor out EXCEPTION_RECORD to exception_ptr conversion.
msvcr100: Move exception_ptr functions to a separate file.
msvcr100: Implement _StructuredTaskCollection::_RunAndWait.
https://gitlab.winehq.org/wine/wine/-/merge_requests/906
[View Less]
Implement a basic GC based on the mark-and-sweep algorithm, without requiring manually specifying "roots", which vastly simplifies the management. For now, it is triggered every 30 seconds since it last finished, on a new object initialization. Better heuristics could be used in the future.
The comments in the code should hopefully understand the high level logic of this approach without boilerplate details. I've tested it on FFXIV launcher (along with other patches from Proton to have it work)…
[View More] and it stops the massive memory leak successfully by itself, so at least it does its job properly. The last patch in the MR is just an optimization for a *very* common case.
For artificial testing, one could use something like:
```javascript
function leak() {
var a = {}, b = {};
a.b = b;
b.a = a;
}
```
which creates a circular ref and will leak when the function returns.
It also introduces and makes use of a "heap_stack", which prevents stack overflows on long chains.
--
v5: jscript: Create the source function's 'prototype' prop object on demand.
jscript: Run the garbage collector every 30 seconds on a new object
jscript: Implement CollectGarbage().
jscript: Implement a Garbage Collector to deal with circular references.
jscript: Use a jsdisp to hold refs for scopes.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1635
[View Less]
This implements the _RunAndWait member function and the constructor of the _StructuredTaskCollection, which enables the tests for that class to run.
Also adds a throw of a missing_wait exception to the destructor when chores are scheduled and _RunAndWait was not called.
Remaining stuff after this:
- Task collection cancelling + IsCancelling function (next MR)
- Cancellation token support (requires RE'ing the _CancellationTokenState class, currently in progress)
--
v3: msvcr120: Throw …
[View More]exception in _StructuredTaskCollection_dtor if _RunAndWait was not called.
msvcr100: Implement missing_wait exception.
msvcr110: Implement _StructuredTaskCollection constructor.
msvcr100: Implement exception passing from chore threads to _RunAndWait.
msvcr100: Factor out EXCEPTION_RECORD to exception_ptr conversion.
msvcr100: Move exception_ptr functions to a separate file.
msvcr100: Implement _StructuredTaskCollection::_RunAndWait.
https://gitlab.winehq.org/wine/wine/-/merge_requests/906
[View Less]
> (The TlsIndex field in the LDR_DATA_TABLE_ENTRY structure appears to be unused except as a flag that the module has TLS (being always set to -1), at least as far back as Windows XP. It is worth mentioning that the WINE implementation of implicit TLS incorrectly uses TlsIndex as the real module TLS index, so it may be unreliable to assume that it is always -1 if you care about working on WINE.)
>
> \- http://www.nynaeve.net/?p=186
and the "links to that article but still doesn't …
[View More]work in wine" award goes to... [the D runtime](https://github.com/dlang/dmd/blob/6bf60ea0eb174631ede0074a77d3898d…! (Admittedly, there aren't too many ways to do what they're trying to do.)
With this, the D runtime will now work in Wine, even if in a dll loaded into an exe with no tls (which gets it the tls index 0)
The changes to the debugger are a bit icky, a possible alternative is to find some other easily-debugger-accessible place to stuff the tls index.
--
v4: ntdll: TlsIndex should not actually contain tls indices
https://gitlab.winehq.org/wine/wine/-/merge_requests/1578
[View Less]