Zebediah Figura (@zfigura) commented about dlls/ntoskrnl.exe/ntoskrnl.c:
> irp->Tail.Overlay.Thread = (PETHREAD)KeGetCurrentThread();
> irp->Tail.Overlay.OriginalFileObject = file;
> irp->RequestorMode = UserMode;
> + HeapFree( GetProcessHeap(), 0, context->in_buff );
> context->in_buff = NULL;
I don't think we need to be deallocating the input buffer; we're not using it. Rather we should just remove the assignment to NULL.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2439#note_29633
--
v2: winepulse: Use mmdevdrv structs from mmdevapi.
wineoss: Use mmdevdrv structs from mmdevapi.
winecoreaudio: Use mmdevdrv structs from mmdevapi.
winealsa: Move common mmdevdrv structs into mmdevapi.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2626
This fixes a bug when the session topology contains an invalid
source, which makes the session thread to hang and stop executing
commands.
--
v6: mf/session: Handle error when a source fails to start.
mf/session: Handle errors when subscribing to source's events.
mf/tests: Test media session error handling.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2496
Today, the test scenario "ACTCTX_FLAG_HMODULE_VALID but hModule if not
set" is broken and unreliable. This problem is not evident in WineHQ
batch test runs; rather, the test failure seems to only be triggered
when the kernel32:actctx test is run in isolation.
When the flag ACTCTX_FLAG_HMODULE_VALID is specified in ACTCTX but
hModule is set to NULL, CreateActCtxW() may encounter different failure
modes depending on the test executable file. Error codes observed so
far include ERROR_SXS_CANT_GEN_ACTCTX and ERROR_SXS_MANIFEST_TOO_BIG.
It appears that the inconsistent failure was caused by Windows trying to
interpret the main executable file of the current process as an XML
manifest file. This fails due to one or more of the following reasons:
- A valid PE executable that starts with the "MZ" signature is not a
valid XML file.
- The executable's size may exceed the limit imposed by the manifest
parser. This is much more likely for binaries with debugging symbols.
Meanwhile, winetest.exe bundles a stripped version of the test
executable (kernel32_test-stripped.exe), which is often smaller than
the original executable (not stripped). This probably explains why
the problem was not visible in batch test runs.
Fix this by changing the FullDllName of the main executable module's
LDR_DATA_TABLE_ENTRY to the pathname of a temporary manifest file (valid
or invalid) before testing. The testing is performed in a child
process, since "corrupting" the internal state of a main test process
is not desirable for achieving deterministic and reliable tests.
Blocks !2555.
--
v4: kernel32/tests: Fix test for ACTCTX_FLAG_HMODULE_VALID with hModule = NULL case.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2617
Currently, the free list consists of a "small list" for sizes below 256,
which are linearly spaced, and a "large list" which is manually split
into a few chunks.
This patch replaces it with a single log-linear policy, while expanding
the range the large list covers.
The old implementation had issues when a lot of large allocations
happened. In this case, all the allocations went in the last catch-all
bucket in the "large list", and what happens is:
1. The linked list grew in size over time, causing searching cost to
skyrocket.
2. With the first-fit allocation policy, fragmentation was also making
the problem worse.
The new bucketing covers the entire range up until we start allocating
large blocks, which will not enter the free list. It also makes the
allocation policy closer to best-fit (although not exactly), reducing
fragmentation.
The increase in number of free lists does incur some cost when it needs
to be skipped over, but the improvement in allocation performance
outweighs it.
For future work, these ideas (mostly from glibc) might or might not
benefit performance:
- Use an exact best-fit allocation policy.
- Add a bitmap for freelist, allowing empty lists to be skipped with a
single bit scan.
For the benchmark, this drastically improves initial shader loading performance in Overwatch 2. In this workload 78k shaders are passed to DXVK for DXBC -> SPIRV translation, and for each shader a few allocation happens in the 4K – 100K range for the staging buffer.
Before this patch, malloc consisted a whooping 43% of overhead. The overhead with log-linear bucketing is drastically lower, resulting in a ~2x improvement in loading time.
Overhead for each `FREE_LIST_LINEAR_BITS` is as below:
- 0: 7.7%
- 1: 2.9%
- 2: 1.3%
- 3: 0.6%
Since performance seems to scale linearly with increase in buckets (up to the point I have tested), I've opted for 3 (8 buckets per doubling) in the current revision of patch.
Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki(a)gmail.com>
--
v6: ntdll: Use log-linear bucketing for free lists.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2622
Currently, the free list consists of a "small list" for sizes below 256,
which are linearly spaced, and a "large list" which is manually split
into a few chunks.
This patch replaces it with a single log-linear policy, while expanding
the range the large list covers.
The old implementation had issues when a lot of large allocations
happened. In this case, all the allocations went in the last catch-all
bucket in the "large list", and what happens is:
1. The linked list grew in size over time, causing searching cost to
skyrocket.
2. With the first-fit allocation policy, fragmentation was also making
the problem worse.
The new bucketing covers the entire range up until we start allocating
large blocks, which will not enter the free list. It also makes the
allocation policy closer to best-fit (although not exactly), reducing
fragmentation.
The increase in number of free lists does incur some cost when it needs
to be skipped over, but the improvement in allocation performance
outweighs it.
For future work, these ideas (mostly from glibc) might or might not
benefit performance:
- Use an exact best-fit allocation policy.
- Add a bitmap for freelist, allowing empty lists to be skipped with a
single bit scan.
For the benchmark, this drastically improves initial shader loading performance in Overwatch 2. In this workload 78k shaders are passed to DXVK for DXBC -> SPIRV translation, and for each shader a few allocation happens in the 4K – 100K range for the staging buffer.
Before this patch, malloc consisted a whooping 43% of overhead. The overhead with log-linear bucketing is drastically lower, resulting in a ~2x improvement in loading time.
Overhead for each `FREE_LIST_LINEAR_BITS` is as below:
- 0: 7.7%
- 1: 2.9%
- 2: 1.3%
- 3: 0.6%
Since performance seems to scale linearly with increase in buckets (up to the point I have tested), I've opted for 3 (8 buckets per doubling) in the current revision of patch.
Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki(a)gmail.com>
--
v5: ntdll: Use log-linear bucketing for free lists.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2622
--
v2: imm32: Use INPUTCONTEXT directly in ImmSetConversionStatus.
imm32: Use INPUTCONTEXT directly in ImmGetConversionStatus.
imm32: Compare open status values in ImmSetOpenStatus.
imm32: Cache INPUTCONTEXT values for every IME.
imm32: Use INPUTCONTEXT directly in ImmSetOpenStatus.
imm32: Use INPUTCONTEXT directly in ImmGetOpenStatus.
imm32: Serialize ImeInquire / ImeDestroy calls.
imm32/tests: Cleanup the cross thread IMC tests.
imm32/tests: Reduce the number of IME installations.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2627