This removes 20 `movaps` instructions from every syscall that calls a sysv_abi function, plus an `and` for stack alignment and some other instructions depending on the function.
In `NtAllocateLocallyUniqueId` for example this reduces the number of instructions from 63 to 36.
I don't entirely understand the llvm-mca output but here are the before and after stats that it outputs for that function:
Before
Iterations: 100
Instructions: 6300
Total Cycles: 3335
…
[View More]Total uOps: 6300
Dispatch Width: 6
uOps Per Cycle: 1.89
IPC: 1.89
Block RThroughput: 15.0
After
Iterations: 100
Instructions: 3600
Total Cycles: 1514
Total uOps: 3600
Dispatch Width: 6
uOps Per Cycle: 2.38
IPC: 2.38
Block RThroughput: 6.0
This currently depends on the stack being aligned by the syscall dispatcher, which afaict is the case if `sizeof(struct syscall_frame) % 16 == 0`. If that is not good enough I can add an `andq $~15,%rsp` somewhere.
One question I have is whether we want to continue supporting CDECL syscalls (only `wine_server_call`, `wine_server_fd_to_handle` and `wine_server_handle_to_fd`)?
If we do, this adds a bit of complexity to the syscall dispatcher, see the commit "FIXUP ntdll: Support CDECL syscalls."
If we don't, and make those syscalls WINAPI instead, then for every call to those functions on x86 it seems to either change nothing or add one `add` instruction. However we of course lose the ability to make CDECL syscalls.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1752
[View Less]
--
v2: wined3d: Enable long types in texture.c.
wined3d: Reduce usage of long integral types in texture.c.
wined3d: Change stencil parameter type in blitter_clear() method.
wined3d: Get/set texture's level_count and lod as unsigned int.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1727
We are currently not initializing static values to zero by default.
Consider the following shader:
```hlsl
static float4 va;
float4 main() : sv_target
{
return va;
}
```
we get the following output:
```
ps_5_0
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, r1.xyzw
mov o0.xyzw, r0.xyzw
ret
```
where r1.xyzw is not initialized.
This patch solves this by assigning the static variable the value of an
uint 0, and thus, relying on complex broadcasts.
This seems to be the behaviour of the the …
[View More]native compiler, since it retrieves
the following error on a shader that lacks an initializer on a data type with
object components:
```
error X3017: cannot convert from 'uint' to 'struct <unnamed>'
```
--
v4: vkd3d-shader/hlsl: Allow uninitialized static objects.
vkd3d-shader/hlsl: Validate that non-uniform objects are not referenced.
tests: Add additional object references tests.
tests: Test proper initialization of static structs to zero.
vkd3d-shader/hlsl: Initialize static variables to 0 by default.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/54
[View Less]
Implement a basic GC based on the mark-and-sweep algorithm, without requiring manually specifying "roots", which vastly simplifies the management. For now, it is triggered every 30 seconds since it last finished, on a new object initialization. Better heuristics could be used in the future.
The comments in the code should hopefully understand the high level logic of this approach without boilerplate details. I've tested it on FFXIV launcher (along with other patches from Proton to have it work)…
[View More] and it stops the massive memory leak successfully by itself, so at least it does its job properly. The last patch in the MR is just an optimization for a *very* common case.
For artificial testing, one could use something like:
```javascript
function leak() {
var a = {}, b = {};
a.b = b;
b.a = a;
}
```
which creates a circular ref and will leak when the function returns.
It also introduces and makes use of a "heap_stack", which prevents stack overflows on long chains.
--
v6: jscript: Create the source function's 'prototype' prop object on demand.
jscript: Run the garbage collector every 30 seconds on a new object
jscript: Implement CollectGarbage().
jscript: Implement a Garbage Collector to deal with circular references.
jscript: Use a jsdisp to hold refs for scopes.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1635
[View Less]
We are currently not initializing static values to zero by default.
Consider the following shader:
```hlsl
static float4 va;
float4 main() : sv_target
{
return va;
}
```
we get the following output:
```
ps_5_0
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, r1.xyzw
mov o0.xyzw, r0.xyzw
ret
```
where r1.xyzw is not initialized.
This patch solves this by assigning the static variable the value of an
uint 0, and thus, relying on complex broadcasts.
This seems to be the behaviour of the the …
[View More]native compiler, since it retrieves
the following error on a shader that lacks an initializer on a data type with
object components:
```
error X3017: cannot convert from 'uint' to 'struct <unnamed>'
```
--
v3: vkd3d-shader/hlsl: Allow uninitialized static objects.
vkd3d-shader/hlsl: Validate that non-uniform objects are not referenced.
tests: Add additional object references tests.
vkd3d-shader/hlsl: Initialize static variables to 0 by default.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/54
[View Less]
This implements the _RunAndWait member function and the constructor of the _StructuredTaskCollection, which enables the tests for that class to run.
Also adds a throw of a missing_wait exception to the destructor when chores are scheduled and _RunAndWait was not called.
Remaining stuff after this:
- Task collection cancelling + IsCancelling function (next MR)
- Cancellation token support (requires RE'ing the _CancellationTokenState class, currently in progress)
--
v4: msvcr120: Throw …
[View More]exception in ~_StructuredTaskCollection if _RunAndWait was not called.
msvcr100: Implement missing_wait exception.
msvcr110: Implement _StructuredTaskCollection constructor.
msvcr100: Implement exception passing from chore threads to _RunAndWait.
msvcr100: Factor out EXCEPTION_RECORD to exception_ptr conversion.
msvcr100: Move exception_ptr functions to a separate file.
msvcr100: Implement _StructuredTaskCollection::_RunAndWait.
https://gitlab.winehq.org/wine/wine/-/merge_requests/906
[View Less]
Implement a basic GC based on the mark-and-sweep algorithm, without requiring manually specifying "roots", which vastly simplifies the management. For now, it is triggered every 30 seconds since it last finished, on a new object initialization. Better heuristics could be used in the future.
The comments in the code should hopefully understand the high level logic of this approach without boilerplate details. I've tested it on FFXIV launcher (along with other patches from Proton to have it work)…
[View More] and it stops the massive memory leak successfully by itself, so at least it does its job properly. The last patch in the MR is just an optimization for a *very* common case.
For artificial testing, one could use something like:
```javascript
function leak() {
var a = {}, b = {};
a.b = b;
b.a = a;
}
```
which creates a circular ref and will leak when the function returns.
It also introduces and makes use of a "heap_stack", which prevents stack overflows on long chains.
--
v5: jscript: Create the source function's 'prototype' prop object on demand.
jscript: Run the garbage collector every 30 seconds on a new object
jscript: Implement CollectGarbage().
jscript: Implement a Garbage Collector to deal with circular references.
jscript: Use a jsdisp to hold refs for scopes.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1635
[View Less]
This implements the _RunAndWait member function and the constructor of the _StructuredTaskCollection, which enables the tests for that class to run.
Also adds a throw of a missing_wait exception to the destructor when chores are scheduled and _RunAndWait was not called.
Remaining stuff after this:
- Task collection cancelling + IsCancelling function (next MR)
- Cancellation token support (requires RE'ing the _CancellationTokenState class, currently in progress)
--
v3: msvcr120: Throw …
[View More]exception in _StructuredTaskCollection_dtor if _RunAndWait was not called.
msvcr100: Implement missing_wait exception.
msvcr110: Implement _StructuredTaskCollection constructor.
msvcr100: Implement exception passing from chore threads to _RunAndWait.
msvcr100: Factor out EXCEPTION_RECORD to exception_ptr conversion.
msvcr100: Move exception_ptr functions to a separate file.
msvcr100: Implement _StructuredTaskCollection::_RunAndWait.
https://gitlab.winehq.org/wine/wine/-/merge_requests/906
[View Less]
> (The TlsIndex field in the LDR_DATA_TABLE_ENTRY structure appears to be unused except as a flag that the module has TLS (being always set to -1), at least as far back as Windows XP. It is worth mentioning that the WINE implementation of implicit TLS incorrectly uses TlsIndex as the real module TLS index, so it may be unreliable to assume that it is always -1 if you care about working on WINE.)
>
> \- http://www.nynaeve.net/?p=186
and the "links to that article but still doesn't …
[View More]work in wine" award goes to... [the D runtime](https://github.com/dlang/dmd/blob/6bf60ea0eb174631ede0074a77d3898d…! (Admittedly, there aren't too many ways to do what they're trying to do.)
With this, the D runtime will now work in Wine, even if in a dll loaded into an exe with no tls (which gets it the tls index 0)
The changes to the debugger are a bit icky, a possible alternative is to find some other easily-debugger-accessible place to stuff the tls index.
--
v4: ntdll: TlsIndex should not actually contain tls indices
https://gitlab.winehq.org/wine/wine/-/merge_requests/1578
[View Less]
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=53176
1) The async test is broken on Windows 10 1507. This appears to be a trend among WinRT dlls. I'm thinking that we can add macro definitions for each testbot VM to avoid having to skip tests that would otherwise be working fine, and just greater control over tests in general.
2) The provider file that contains `interface IWineAsyncInfoImpl` is likely going to be reused again in the future. Perhaps a new file can be added in the include/…
[View More]wine folder to prevent duplicate code? I can do this in a separate merge request and cleanup the existing provider files.
3) All the check_bool_async tests return a random async_id so I skipped them. Testbot example: https://testbot.winehq.org/JobDetails.pl?Key=127232&f208=exe32.report#k208
--
v2: cryptowinrt: Implement IKeyCredentialManagerStatics_IsSupportedAsync.
cryptowinrt/tests: Add IKeyCredentialManagerStatics_IsSupportedAsync tests.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1714
[View Less]
It looks like there was a change on the websocket test server that causes our
send operations to block. Reducing the size of the send buffer works around the
test failures. The tests still succeed on Windows, which suggests that native
sends large buffers in smaller chunks.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1741
GCC12.2 warns about dereferencing a pointer to RpcPktHdr
while it has been allocated to the size of one of the packet
(hence smaller in some cases).
/home/eric/work/wine/dlls/rpcrt4/rpc_message.c:111:26: warning: array subscript 'RpcPktHdr[0]' is partly outside array bounds of 'unsigned char[24]' [-Warray-bounds]
111 | Header->common.rpc_ver = RPC_VER_MAJOR;
This patch fixes the warnings by accessing the created object
through a pointer to their type (and not through the union).
Notes:…
[View More]
- the 'max(sizeof(...), FIELD_OFFSET(...))' thingie in
RPCRT4_BuildBindNackHeader avoids also a warning as the
FIELD_OFFSET() can be smaller than the size of the structure.
This could be avoided by using a flexible array member for
the 'protocols' field instead of 'protocols[ANYSIZE_ARRAY]'.
- I only changed the allocation routines when the allocated size
is smaller than the union.
If the strategy is validated, one could consider applying the
same allocation strategy to all helpers for symmetry reasons.
Hence the draft status for now, waiting for feedback.
Signed-off-by: Eric Pouech <eric.pouech(a)gmail.com>
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1545
[View Less]
In this patch set, GetOutputType() currently fails for NV11 and the following RGB types because MFCalculateImageSize() fails for them. I'll fix MFCalculateImageSize() for them recently (but not in this patch set).
--
v2: winegstreamer: Implement GetOutputType for WMV decoder.
winegstreamer: Add RGB8, IYUV, NV11 to wg_video_format.
mf/tests: Test info headers returned by GetOutputType for WMV decoder.
mf/tests: Test GetOutputType for WMV decoder.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1707
Implement a basic GC based on the mark-and-sweep algorithm, without requiring manually specifying "roots", which vastly simplifies the management. For now, it is triggered every 30 seconds since it last finished, on a new object initialization. Better heuristics could be used in the future.
The comments in the code should hopefully understand the high level logic of this approach without boilerplate details. I've tested it on FFXIV launcher (along with other patches from Proton to have it work)…
[View More] and it stops the massive memory leak successfully by itself, so at least it does its job properly. The last patch in the MR is just an optimization for a *very* common case.
For artificial testing, one could use something like:
```javascript
function leak() {
var a = {}, b = {};
a.b = b;
b.a = a;
}
```
which creates a circular ref and will leak when the function returns.
It also introduces and makes use of a "heap_stack", which prevents stack overflows on long chains.
--
v4: jscript: Create the source function's 'prototype' prop object on demand.
jscript: Run the garbage collector every 30 seconds on a new object
jscript: Implement CollectGarbage().
jscript: Implement a Garbage Collector to deal with circular references.
jscript: Use a jsdisp to hold refs for scopes.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1635
[View Less]
MR !607 was trying to fix an issue with Life Is Strange Remastered, but
although it fixed some race conditions with presentation end, the issue
it was trying to fix is still there.
The game calls IMFMediaSession_Stop while the presentation is ending, expects
that command to quickly execute, interrupting the presentation end and
emitting a MESessionStopped event instead of the MESessionEnded.
Delaying the Stop command and emitting the MESessionEnded event breaks
the game assumptions and it …
[View More]crashes.
--
v3: mf: Discard end of presentation on IMFMediaSession_Stop.
mf/tests: Test IMFMediaSession_Stop command near presentation end.
mf/tests: Test Start / Pause / Stop IMFMEdiaSession events.
mf/tests: Split wait_media_event helper into wait_next_media_event.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1710
[View Less]
--
v3: include/msvcrt: Add __WINE_(ALLOC_SIZE|DEALLOC|MALLOC) attributes to _aligned_malloc functions.
include/msvcrt: Add __WINE_(DEALLOC|MALLOC) attributes to _strdup and _wcsdup.
msvcp140_atomic_wait: Don't use the reserved variable name 'environ'.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1686
Manual tests on Windows 10 show that calling Sleep(0) or NtDelayExecution() with zero timeout in a
loop do consume 100% of a CPU core, which is closer to the behavior of NtYieldExecution() than
usleep(0). usleep(0) gives up the remaining timeslices even if there are no other threads to switch
to, causing low utilization of CPU and performance issues.
The original patch is b1a79c6 and the idea is to use usleep(0) to avoid a thread taking 100% of a
CPU core for StarCraft 2 and Shadow of the Tomb …
[View More]Raider. However with wine-7.22, reverting the
usleep(0) patch causes no behavior changes. For Shadow of the Tomb Raider, the 100% CPU issue is
gone with or without the patch. For StarCraft 2, there is always a thread taking 100% CPU even with
the patch. After discussing with Matteo, we decided it's better to revert the patch.
Fix Mortal Kombat X performance drop during tower selection and Ragnarok Online bad performance.
This reverts commit e86b4015ff405d4c054b8a5bc855ee655e1a833c.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=53327
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1715
[View Less]
MR !607 was trying to fix an issue with Life Is Strange Remastered, but
although it fixed some race conditions with presentation end, the issue
it was trying to fix is still there.
The game calls IMFMediaSession_Stop while the presentation is ending, expects
that command to quickly execute, interrupting the presentation end and
emitting a MESessionStopped event instead of the MESessionEnded.
Delaying the Stop command and emitting the MESessionEnded event breaks
the game assumptions and it …
[View More]crashes.
--
v2: mf: Discard end of presentation on IMFMediaSession_Stop.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1710
[View Less]
On Wed Dec 7 03:23:03 2022 +0000, Zebediah Figura wrote:
> > This is probably fine for simplicity, but note the existence of
> WINED3DPMISCCAPS_INDEPENDENTWRITEMASKS (and "NumSimultaneousRTs" more
> generally); this is not strictly a requirement for D3D 9.3.
> Is it not? [1] at least states it is. For d3d9 we'd presumably want to
> set WINED3DPMISCCAPS_INDEPENDENTWRITEMASKS appropriately (which,
> granted, I don't have a patch for yet) but in terms of setting feature
> …
[View More]levels it seems right?
> [1] https://learn.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct…
Oh, I was looking at the required features for Direct3D 9, not Direct3D 11 downlevel devices. For the most part that shouldn't matter too much, because we'll happily create a device with a lower feature level in d3d9, but we do restrict the maximum shader version based on the feature level (shader_max_version_from_feature_level()). I.e., AFAIK in d3d9 it should be possible to use shader model 3 shaders without having D3DPMISCCAPS_INDEPENDENTWRITEMASKS.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1693#note_18956
[View Less]
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=53176
1) The async test is broken on Windows 10 1507. This appears to be a trend among WinRT dlls. I'm thinking that we can add macro definitions for each testbot VM to avoid having to skip tests that would otherwise be working fine, and just greater control over tests in general.
2) The provider file that contains `interface IWineAsyncInfoImpl` is likely going to be reused again in the future. Perhaps a new file can be added in the include/…
[View More]wine folder to prevent duplicate code? I can do this in a separate merge request and cleanup the existing provider files.
3) All the check_bool_async tests return a random async_id so I skipped them. Testbot example: https://testbot.winehq.org/JobDetails.pl?Key=127232&f208=exe32.report#k208
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1714
[View Less]