Hi all!
This is a heap implementation based on thread-local structures, that I have been keeping locally for quite some time. The goal was to improve Wine's heap performance in multithreaded scenarios and see if it could help performance in some games.
The good news is that this implementation is performing well, according to third-party heap micro benchmarks. The bad news is that it doesn't change performance much in general, as allocations are usually scarse during gameplay. I could still see improvements for loading times, and less stalling as well.
As I've been tweaking it lately and now that it passes the heap tests I believe it could still be somehow useful for others, or interesting for anyone eager to try. It could also possibly be used for comparison, to determine if the heap is the bottleneck for some use case.
The implementation details are in patch #9 for anyone interested. The thread-local heaps are enabled by HeapSetInformation calls, when the call requests LFH heap, but I also added a WINELFH environment variable to guard it so that it can also be disabled globally.
There are multiple issues that I am aware of, but for the applications that I tried they didn't seem to cause much trouble:
* Block headers are different from the default ones, but then FWIW the default block headers don't seem to match with recent Windows either, which also vary depending on the Windows version.
* HeapWalk will not show the blocks allocated from the thread-local heaps. It could be possible to also walk the current thread-local heap but then because of implementation detail, some blocks would stlil not show up (fully used arenas are detached from their heap until a block is freed).
* HeapDestroy doesn't automatically free the blocks that were allocated from the thread-local heaps and they can possibly be lost forever for the same reason as above. Although it could be possible to keep track of fully used arenas, the same thread-local heaps are shared between heaps, so there's currently no way to tell which block should be freed on destroy.
* The freed blocks are deferred to the thread that allocated them, but in general they are quickly reused. There's some validation being done but it could still cause some issues in case of use after free. It should be easily fixed by adding a buffer, but as I didn't see any game needing it, didn't seem to worth the trouble for now.
Cheers,
Rémi Bernon (11): kernel32: Catch page faults in GlobalSize. kernel32/tests: Add HeapSetInformation and LFH tests. ntdll: Split standard heap functions. ntdll: Add thread destroy notification function. ntdll: Add extended heap type and LFH stubs. ntdll: Implement RtlSetHeapInformation for LFH. ntdll: Move undocumented flags to ntdll_misc.h. HACK: ntdll: Conditionally enable LFH. ntdll: Implement Low Fragmentation Heap. ntdll: Enable LFH for process heap. msvcrt: Enable LFH for internal heaps.
dlls/kernel32/heap.c | 78 +-- dlls/kernel32/tests/heap.c | 66 +++ dlls/msvcrt/heap.c | 4 + dlls/ntdll/Makefile.in | 1 + dlls/ntdll/heap.c | 220 ++++++-- dlls/ntdll/heap_lfh.c | 1057 ++++++++++++++++++++++++++++++++++++ dlls/ntdll/loader.c | 3 + dlls/ntdll/ntdll_misc.h | 28 + dlls/ntdll/thread.c | 1 + 9 files changed, 1379 insertions(+), 79 deletions(-) create mode 100644 dlls/ntdll/heap_lfh.c
In the same way GlobalFree does already. --- dlls/kernel32/heap.c | 78 +++++++++++++++++++++++++------------------- 1 file changed, 44 insertions(+), 34 deletions(-)
diff --git a/dlls/kernel32/heap.c b/dlls/kernel32/heap.c index 6c5219b624f9..08139db4be98 100644 --- a/dlls/kernel32/heap.c +++ b/dlls/kernel32/heap.c @@ -329,41 +329,51 @@ SIZE_T WINAPI GlobalSize(HGLOBAL hmem) return 0; }
- if(ISPOINTER(hmem)) - { - retval=HeapSize(GetProcessHeap(), 0, hmem); - - if (retval == ~0ul) /* It might be a GMEM_MOVEABLE data pointer */ - { - retval = HeapSize(GetProcessHeap(), 0, (char*)hmem - HGLOBAL_STORAGE); - if (retval != ~0ul) retval -= HGLOBAL_STORAGE; - } - } - else - { - RtlLockHeap(GetProcessHeap()); - pintern=HANDLE_TO_INTERN(hmem); + RtlLockHeap(GetProcessHeap()); + __TRY + { + if(ISPOINTER(hmem)) + { + retval=HeapSize(GetProcessHeap(), 0, hmem); + + if (retval == ~0ul) /* It might be a GMEM_MOVEABLE data pointer */ + { + retval = HeapSize(GetProcessHeap(), 0, (char*)hmem - HGLOBAL_STORAGE); + if (retval != ~0ul) retval -= HGLOBAL_STORAGE; + } + } + else + { + pintern=HANDLE_TO_INTERN(hmem); + + if(pintern->Magic==MAGIC_GLOBAL_USED) + { + if (!pintern->Pointer) /* handle case of GlobalAlloc( ??,0) */ + retval = 0; + else + { + retval = HeapSize(GetProcessHeap(), 0, (char *)pintern->Pointer - HGLOBAL_STORAGE ); + if (retval != ~0ul) retval -= HGLOBAL_STORAGE; + } + } + else + { + WARN("invalid handle %p (Magic: 0x%04x)\n", hmem, pintern->Magic); + SetLastError(ERROR_INVALID_HANDLE); + retval=0; + } + } + } + __EXCEPT_PAGE_FAULT + { + SetLastError( ERROR_INVALID_HANDLE ); + retval = 0; + } + __ENDTRY + RtlUnlockHeap(GetProcessHeap());
- if(pintern->Magic==MAGIC_GLOBAL_USED) - { - if (!pintern->Pointer) /* handle case of GlobalAlloc( ??,0) */ - retval = 0; - else - { - retval = HeapSize(GetProcessHeap(), 0, (char *)pintern->Pointer - HGLOBAL_STORAGE ); - if (retval != ~0ul) retval -= HGLOBAL_STORAGE; - } - } - else - { - WARN("invalid handle %p (Magic: 0x%04x)\n", hmem, pintern->Magic); - SetLastError(ERROR_INVALID_HANDLE); - retval=0; - } - RtlUnlockHeap(GetProcessHeap()); - } - if (retval == ~0ul) retval = 0; - return retval; + if (retval == ~0ul) retval = 0; + return retval; }
--- dlls/kernel32/tests/heap.c | 68 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+)
diff --git a/dlls/kernel32/tests/heap.c b/dlls/kernel32/tests/heap.c index bc57498bf91e..550d7190f2ca 100644 --- a/dlls/kernel32/tests/heap.c +++ b/dlls/kernel32/tests/heap.c @@ -39,6 +39,7 @@ #define HEAP_VALIDATE_PARAMS 0x40000000
static BOOL (WINAPI *pHeapQueryInformation)(HANDLE, HEAP_INFORMATION_CLASS, PVOID, SIZE_T, PSIZE_T); +static BOOL (WINAPI *pHeapSetInformation)(HANDLE, HEAP_INFORMATION_CLASS, PVOID, SIZE_T); static BOOL (WINAPI *pGetPhysicallyInstalledSystemMemory)(ULONGLONG *); static ULONG (WINAPI *pRtlGetNtGlobalFlags)(void);
@@ -531,6 +532,8 @@ static void test_HeapCreate(void) UINT i; BOOL error; DWORD dwSize; + ULONG hci; + SIZE_T size;
/* Retrieve the page size for this system */ GetSystemInfo(&sysInfo); @@ -627,6 +630,71 @@ static void test_HeapCreate(void)
/* Check that HeapDestroy works */ ok(HeapDestroy(heap),"HeapDestroy failed\n"); + + + if (!(pHeapQueryInformation = (void *)GetProcAddress(GetModuleHandleA("kernel32.dll"), "HeapQueryInformation")) || + !(pHeapSetInformation = (void *)GetProcAddress(GetModuleHandleA("kernel32.dll"), "HeapSetInformation"))) + win_skip("HeapQueryInformation / HeapSetInformation not available\n"); + else + { + heap = HeapCreate(0, 0, 0); + ok(!!heap, "HeapCreate failed\n"); + + mem1 = HeapAlloc(heap, 0, 16); + mem2 = HeapAlloc(heap, 0, 16); + + ok(pHeapQueryInformation(heap, HeapCompatibilityInformation, &hci, sizeof(hci), &size), + "HeapQueryInformation failed\n"); + trace("HeapQueryInformation returned %d\n", hci); + + hci = 2; + ok(pHeapSetInformation(heap, HeapCompatibilityInformation, &hci, sizeof(hci)), + "HeapSetInformation failed\n"); + ok(pHeapQueryInformation(heap, HeapCompatibilityInformation, &hci, sizeof(hci), &size), + "HeapQueryInformation failed\n"); + trace("HeapQueryInformation returned %d\n", hci); + + hci = 1; + SetLastError(0xdeadbeef); + todo_wine + ok(!pHeapSetInformation(heap, HeapCompatibilityInformation, &hci, sizeof(hci)), + "HeapSetInformation succeeded\n"); + todo_wine + ok(GetLastError() == ERROR_GEN_FAILURE, + "expected ERROR_GEN_FAILURE, got %u\n", GetLastError()); + + mem3 = HeapAlloc(heap, 0, 16); + + ok(HeapValidate(heap, 0, NULL), "HeapValidate failed\n"); + + SetLastError(0xdeadbeef); + dwSize = HeapSize(heap, 0, mem1); + ok(dwSize == 16, "HeapSize failed\n"); + ok(GetLastError() == 0xdeadbeef, "GetLastError failed: %u\n", GetLastError()); + mem1 = HeapReAlloc(heap, 0, mem1, 1024); + ok(mem1 != NULL, "HeapReAlloc failed\n"); + + dwSize = HeapSize(heap, 0, mem1); + ok(dwSize == 1024, "HeapSize failed\n"); + + dwSize = HeapSize(heap, 0, mem2); + ok(dwSize == 16, "HeapSize failed\n"); + ok(GetLastError() == 0xdeadbeef, "GetLastError failed: %u\n", GetLastError()); + + dwSize = HeapSize(heap, 0, mem3); + ok(dwSize == 16, "HeapSize failed\n"); + ok(GetLastError() == 0xdeadbeef, "GetLastError failed: %u\n", GetLastError()); + + ok(HeapValidate(heap, 0, NULL), "HeapValidate failed\n"); + + ok(HeapFree(heap, 0, mem1), "HeapFree failed\n"); + ok(HeapFree(heap, 0, mem2), "HeapFree failed\n"); + ok(HeapFree(heap, 0, mem3), "HeapFree failed\n"); + + ok(HeapValidate(heap, 0, NULL), "HeapValidate failed\n"); + + ok(HeapDestroy(heap),"HeapDestroy failed\n"); + } }
Hi,
While running your changed tests, I think I found new failures. Being a bot and all I'm not very good at pattern recognition, so I might be wrong, but could you please double-check?
Full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=71287
Your paranoid android.
=== debiant (32 bit WoW report) ===
kernel32: change.c:320: Test failed: should be ready
--- dlls/ntdll/heap.c | 113 +++++++++++++++++++++++++++------------- dlls/ntdll/ntdll_misc.h | 7 +++ 2 files changed, 84 insertions(+), 36 deletions(-)
diff --git a/dlls/ntdll/heap.c b/dlls/ntdll/heap.c index 0080a89fddb6..ac0a1c800163 100644 --- a/dlls/ntdll/heap.c +++ b/dlls/ntdll/heap.c @@ -1335,8 +1335,6 @@ static BOOL HEAP_IsRealArena( HEAP *heapPtr, /* [in] ptr to the heap */ BOOL ret = FALSE; const ARENA_LARGE *large_arena;
- flags &= HEAP_NO_SERIALIZE; - flags |= heapPtr->flags; /* calling HeapLock may result in infinite recursion, so do the critsect directly */ if (!(flags & HEAP_NO_SERIALIZE)) RtlEnterCriticalSection( &heapPtr->critSection ); @@ -1644,18 +1642,27 @@ HANDLE WINAPI RtlDestroyHeap( HANDLE heap ) * This call does not SetLastError(). */ void * WINAPI DECLSPEC_HOTPATCH RtlAllocateHeap( HANDLE heap, ULONG flags, SIZE_T size ) +{ + HEAP *heapPtr; + + if (!(heapPtr = HEAP_GetPtr( heap ))) + return NULL; + + flags &= HEAP_GENERATE_EXCEPTIONS | HEAP_NO_SERIALIZE | HEAP_ZERO_MEMORY; + flags |= heapPtr->flags; + + return HEAP_std_allocate( heapPtr, flags, size ); +} + +void * HEAP_std_allocate( HEAP *heapPtr, ULONG flags, SIZE_T size ) { ARENA_FREE *pArena; ARENA_INUSE *pInUse; SUBHEAP *subheap; - HEAP *heapPtr = HEAP_GetPtr( heap ); SIZE_T rounded_size;
/* Validate the parameters */
- if (!heapPtr) return NULL; - flags &= HEAP_GENERATE_EXCEPTIONS | HEAP_NO_SERIALIZE | HEAP_ZERO_MEMORY; - flags |= heapPtr->flags; rounded_size = ROUND_SIZE(size) + HEAP_TAIL_EXTRA_SIZE( flags ); if (rounded_size < size) /* overflow */ { @@ -1668,10 +1675,10 @@ void * WINAPI DECLSPEC_HOTPATCH RtlAllocateHeap( HANDLE heap, ULONG flags, SIZE_
if (rounded_size >= HEAP_MIN_LARGE_BLOCK_SIZE && (flags & HEAP_GROWABLE)) { - void *ret = allocate_large_block( heap, flags, size ); + void *ret = allocate_large_block( heapPtr, flags, size ); if (!(flags & HEAP_NO_SERIALIZE)) RtlLeaveCriticalSection( &heapPtr->critSection ); if (!ret && (flags & HEAP_GENERATE_EXCEPTIONS)) RtlRaiseStatus( STATUS_NO_MEMORY ); - TRACE("(%p,%08x,%08lx): returning %p\n", heap, flags, size, ret ); + TRACE("(%p,%08x,%08lx): returning %p\n", heapPtr, flags, size, ret ); return ret; }
@@ -1680,7 +1687,7 @@ void * WINAPI DECLSPEC_HOTPATCH RtlAllocateHeap( HANDLE heap, ULONG flags, SIZE_ if (!(pArena = HEAP_FindFreeBlock( heapPtr, rounded_size, &subheap ))) { TRACE("(%p,%08x,%08lx): returning NULL\n", - heap, flags, size ); + heapPtr, flags, size ); if (!(flags & HEAP_NO_SERIALIZE)) RtlLeaveCriticalSection( &heapPtr->critSection ); if (flags & HEAP_GENERATE_EXCEPTIONS) RtlRaiseStatus( STATUS_NO_MEMORY ); return NULL; @@ -1709,7 +1716,7 @@ void * WINAPI DECLSPEC_HOTPATCH RtlAllocateHeap( HANDLE heap, ULONG flags, SIZE_
if (!(flags & HEAP_NO_SERIALIZE)) RtlLeaveCriticalSection( &heapPtr->critSection );
- TRACE("(%p,%08x,%08lx): returning %p\n", heap, flags, size, pInUse + 1 ); + TRACE("(%p,%08x,%08lx): returning %p\n", heapPtr, flags, size, pInUse + 1 ); return pInUse + 1; }
@@ -1730,16 +1737,11 @@ void * WINAPI DECLSPEC_HOTPATCH RtlAllocateHeap( HANDLE heap, ULONG flags, SIZE_ */ BOOLEAN WINAPI DECLSPEC_HOTPATCH RtlFreeHeap( HANDLE heap, ULONG flags, void *ptr ) { - ARENA_INUSE *pInUse; - SUBHEAP *subheap; HEAP *heapPtr;
- /* Validate the parameters */ - if (!ptr) return TRUE; /* freeing a NULL ptr isn't an error in Win2k */
- heapPtr = HEAP_GetPtr( heap ); - if (!heapPtr) + if (!(heapPtr = HEAP_GetPtr( heap ))) { RtlSetLastWin32ErrorAndNtStatusFromNtStatus( STATUS_INVALID_HANDLE ); return FALSE; @@ -1747,6 +1749,17 @@ BOOLEAN WINAPI DECLSPEC_HOTPATCH RtlFreeHeap( HANDLE heap, ULONG flags, void *pt
flags &= HEAP_NO_SERIALIZE; flags |= heapPtr->flags; + + return HEAP_std_free( heapPtr, flags, ptr ); +} + +BOOLEAN HEAP_std_free( HEAP *heapPtr, ULONG flags, void *ptr ) +{ + ARENA_INUSE *pInUse; + SUBHEAP *subheap; + + /* Validate the parameters */ + if (!(flags & HEAP_NO_SERIALIZE)) RtlEnterCriticalSection( &heapPtr->critSection );
/* Inform valgrind we are trying to free memory, so it can throw up an error message */ @@ -1762,13 +1775,13 @@ BOOLEAN WINAPI DECLSPEC_HOTPATCH RtlFreeHeap( HANDLE heap, ULONG flags, void *pt HEAP_MakeInUseBlockFree( subheap, pInUse );
if (!(flags & HEAP_NO_SERIALIZE)) RtlLeaveCriticalSection( &heapPtr->critSection ); - TRACE("(%p,%08x,%p): returning TRUE\n", heap, flags, ptr ); + TRACE("(%p,%08x,%p): returning TRUE\n", heapPtr, flags, ptr ); return TRUE;
error: if (!(flags & HEAP_NO_SERIALIZE)) RtlLeaveCriticalSection( &heapPtr->critSection ); RtlSetLastWin32ErrorAndNtStatusFromNtStatus( STATUS_INVALID_PARAMETER ); - TRACE("(%p,%08x,%p): returning FALSE\n", heap, flags, ptr ); + TRACE("(%p,%08x,%p): returning FALSE\n", heapPtr, flags, ptr ); return FALSE; }
@@ -1790,24 +1803,33 @@ error: */ PVOID WINAPI RtlReAllocateHeap( HANDLE heap, ULONG flags, PVOID ptr, SIZE_T size ) { - ARENA_INUSE *pArena; HEAP *heapPtr; - SUBHEAP *subheap; - SIZE_T oldBlockSize, oldActualSize, rounded_size; - void *ret;
- if (!ptr) return NULL; + if (!ptr) + return NULL; + if (!(heapPtr = HEAP_GetPtr( heap ))) { RtlSetLastWin32ErrorAndNtStatusFromNtStatus( STATUS_INVALID_HANDLE ); return NULL; }
- /* Validate the parameters */ - flags &= HEAP_GENERATE_EXCEPTIONS | HEAP_NO_SERIALIZE | HEAP_ZERO_MEMORY | HEAP_REALLOC_IN_PLACE_ONLY; flags |= heapPtr->flags; + + return HEAP_std_reallocate( heapPtr, flags, ptr, size ); +} + +void *HEAP_std_reallocate( HEAP *heapPtr, ULONG flags, void *ptr, SIZE_T size ) +{ + ARENA_INUSE *pArena; + SUBHEAP *subheap; + SIZE_T oldBlockSize, oldActualSize, rounded_size; + void *ret; + + /* Validate the parameters */ + if (!(flags & HEAP_NO_SERIALIZE)) RtlEnterCriticalSection( &heapPtr->critSection );
rounded_size = ROUND_SIZE(size) + HEAP_TAIL_EXTRA_SIZE(flags); @@ -1903,20 +1925,20 @@ PVOID WINAPI RtlReAllocateHeap( HANDLE heap, ULONG flags, PVOID ptr, SIZE_T size ret = pArena + 1; done: if (!(flags & HEAP_NO_SERIALIZE)) RtlLeaveCriticalSection( &heapPtr->critSection ); - TRACE("(%p,%08x,%p,%08lx): returning %p\n", heap, flags, ptr, size, ret ); + TRACE("(%p,%08x,%p,%08lx): returning %p\n", heapPtr, flags, ptr, size, ret ); return ret;
oom: if (!(flags & HEAP_NO_SERIALIZE)) RtlLeaveCriticalSection( &heapPtr->critSection ); if (flags & HEAP_GENERATE_EXCEPTIONS) RtlRaiseStatus( STATUS_NO_MEMORY ); RtlSetLastWin32ErrorAndNtStatusFromNtStatus( STATUS_NO_MEMORY ); - TRACE("(%p,%08x,%p,%08lx): returning NULL\n", heap, flags, ptr, size ); + TRACE("(%p,%08x,%p,%08lx): returning NULL\n", heapPtr, flags, ptr, size ); return NULL;
error: if (!(flags & HEAP_NO_SERIALIZE)) RtlLeaveCriticalSection( &heapPtr->critSection ); RtlSetLastWin32ErrorAndNtStatusFromNtStatus( STATUS_INVALID_PARAMETER ); - TRACE("(%p,%08x,%p,%08lx): returning NULL\n", heap, flags, ptr, size ); + TRACE("(%p,%08x,%p,%08lx): returning NULL\n", heapPtr, flags, ptr, size ); return NULL; }
@@ -2005,18 +2027,26 @@ BOOLEAN WINAPI RtlUnlockHeap( HANDLE heap ) */ SIZE_T WINAPI RtlSizeHeap( HANDLE heap, ULONG flags, const void *ptr ) { - SIZE_T ret; - const ARENA_INUSE *pArena; - SUBHEAP *subheap; - HEAP *heapPtr = HEAP_GetPtr( heap ); + HEAP *heapPtr;
- if (!heapPtr) + if (!(heapPtr = HEAP_GetPtr( heap ))) { RtlSetLastWin32ErrorAndNtStatusFromNtStatus( STATUS_INVALID_HANDLE ); return ~0UL; } + flags &= HEAP_NO_SERIALIZE; flags |= heapPtr->flags; + + return HEAP_std_get_allocated_size( heapPtr, flags, ptr ); +} + +SIZE_T HEAP_std_get_allocated_size( HEAP *heapPtr, ULONG flags, const void *ptr ) +{ + SIZE_T ret; + const ARENA_INUSE *pArena; + SUBHEAP *subheap; + if (!(flags & HEAP_NO_SERIALIZE)) RtlEnterCriticalSection( &heapPtr->critSection );
pArena = (const ARENA_INUSE *)ptr - 1; @@ -2036,7 +2066,7 @@ SIZE_T WINAPI RtlSizeHeap( HANDLE heap, ULONG flags, const void *ptr ) } if (!(flags & HEAP_NO_SERIALIZE)) RtlLeaveCriticalSection( &heapPtr->critSection );
- TRACE("(%p,%08x,%p): returning %08lx\n", heap, flags, ptr, ret ); + TRACE("(%p,%08x,%p): returning %08lx\n", heapPtr, flags, ptr, ret ); return ret; }
@@ -2057,8 +2087,19 @@ SIZE_T WINAPI RtlSizeHeap( HANDLE heap, ULONG flags, const void *ptr ) */ BOOLEAN WINAPI RtlValidateHeap( HANDLE heap, ULONG flags, LPCVOID ptr ) { - HEAP *heapPtr = HEAP_GetPtr( heap ); - if (!heapPtr) return FALSE; + HEAP *heapPtr; + + if (!(heapPtr = HEAP_GetPtr( heap ))) + return FALSE; + + flags &= HEAP_NO_SERIALIZE; + flags |= heapPtr->flags; + + return HEAP_std_validate( heapPtr, flags, ptr ); +} + +BOOLEAN HEAP_std_validate( HEAP *heapPtr, ULONG flags, const void *ptr ) +{ return HEAP_IsRealArena( heapPtr, flags, ptr, QUIET ); }
diff --git a/dlls/ntdll/ntdll_misc.h b/dlls/ntdll/ntdll_misc.h index 13771d230a88..fc78a24dc9d3 100644 --- a/dlls/ntdll/ntdll_misc.h +++ b/dlls/ntdll/ntdll_misc.h @@ -283,6 +283,13 @@ static inline int get_unix_exit_code( NTSTATUS status ) return status; }
+struct tagHEAP; +void *HEAP_std_allocate( struct tagHEAP *heap, ULONG flags, SIZE_T size ); +BOOLEAN HEAP_std_free( struct tagHEAP *heap, ULONG flags, void *ptr ); +void *HEAP_std_reallocate( struct tagHEAP *heap, ULONG flags, void *ptr, SIZE_T size ); +SIZE_T HEAP_std_get_allocated_size( struct tagHEAP *heap, ULONG flags, const void *ptr ); +BOOLEAN HEAP_std_validate( struct tagHEAP *heap, ULONG flags, const void *ptr ); + extern mode_t FILE_umask DECLSPEC_HIDDEN; extern HANDLE keyed_event DECLSPEC_HIDDEN; extern SYSTEM_CPU_INFORMATION cpu_info DECLSPEC_HIDDEN;
This will be used in LFH to recycle the thread local data. --- dlls/ntdll/heap.c | 4 ++++ dlls/ntdll/loader.c | 1 + dlls/ntdll/ntdll_misc.h | 2 ++ dlls/ntdll/thread.c | 1 + 4 files changed, 8 insertions(+)
diff --git a/dlls/ntdll/heap.c b/dlls/ntdll/heap.c index ac0a1c800163..2d6c8e0ca19f 100644 --- a/dlls/ntdll/heap.c +++ b/dlls/ntdll/heap.c @@ -2293,3 +2293,7 @@ NTSTATUS WINAPI RtlSetHeapInformation( HANDLE heap, HEAP_INFORMATION_CLASS info_ FIXME("%p %d %p %ld stub\n", heap, info_class, info, size); return STATUS_SUCCESS; } + +void HEAP_notify_thread_destroy( BOOLEAN last ) +{ +} diff --git a/dlls/ntdll/loader.c b/dlls/ntdll/loader.c index b5091e2b5627..a5b86f289f47 100644 --- a/dlls/ntdll/loader.c +++ b/dlls/ntdll/loader.c @@ -3730,6 +3730,7 @@ void WINAPI RtlExitUserProcess( DWORD status ) RtlAcquirePebLock(); NtTerminateProcess( 0, status ); LdrShutdownProcess(); + HEAP_notify_thread_destroy(TRUE); NtTerminateProcess( GetCurrentProcess(), status ); exit( get_unix_exit_code( status )); } diff --git a/dlls/ntdll/ntdll_misc.h b/dlls/ntdll/ntdll_misc.h index fc78a24dc9d3..3369eb980a5f 100644 --- a/dlls/ntdll/ntdll_misc.h +++ b/dlls/ntdll/ntdll_misc.h @@ -290,6 +290,8 @@ void *HEAP_std_reallocate( struct tagHEAP *heap, ULONG flags, void *ptr, SIZE_ SIZE_T HEAP_std_get_allocated_size( struct tagHEAP *heap, ULONG flags, const void *ptr ); BOOLEAN HEAP_std_validate( struct tagHEAP *heap, ULONG flags, const void *ptr );
+void HEAP_notify_thread_destroy( BOOLEAN last ); + extern mode_t FILE_umask DECLSPEC_HIDDEN; extern HANDLE keyed_event DECLSPEC_HIDDEN; extern SYSTEM_CPU_INFORMATION cpu_info DECLSPEC_HIDDEN; diff --git a/dlls/ntdll/thread.c b/dlls/ntdll/thread.c index b25f87e43755..f223ca9fb5f9 100644 --- a/dlls/ntdll/thread.c +++ b/dlls/ntdll/thread.c @@ -358,6 +358,7 @@ void WINAPI RtlExitUserThread( ULONG status )
LdrShutdownThread(); RtlFreeThreadActivationContextStack(); + HEAP_notify_thread_destroy(FALSE);
pthread_sigmask( SIG_BLOCK, &server_block_set, NULL );
--- dlls/ntdll/Makefile.in | 1 + dlls/ntdll/heap.c | 68 +++++++++++++++++++++++++++++++++++++---- dlls/ntdll/heap_lfh.c | 55 +++++++++++++++++++++++++++++++++ dlls/ntdll/ntdll_misc.h | 12 ++++++++ 4 files changed, 130 insertions(+), 6 deletions(-) create mode 100644 dlls/ntdll/heap_lfh.c
diff --git a/dlls/ntdll/Makefile.in b/dlls/ntdll/Makefile.in index 7971ef98cf0f..5ec734a045e5 100644 --- a/dlls/ntdll/Makefile.in +++ b/dlls/ntdll/Makefile.in @@ -21,6 +21,7 @@ C_SRCS = \ file.c \ handletable.c \ heap.c \ + heap_lfh.c \ large_int.c \ loader.c \ loadorder.c \ diff --git a/dlls/ntdll/heap.c b/dlls/ntdll/heap.c index 2d6c8e0ca19f..ca3d05450e71 100644 --- a/dlls/ntdll/heap.c +++ b/dlls/ntdll/heap.c @@ -164,6 +164,7 @@ typedef struct tagHEAP ARENA_INUSE **pending_free; /* Ring buffer for pending free requests */ RTL_CRITICAL_SECTION critSection; /* Critical section for serialization */ FREE_LIST_ENTRY *freeList; /* Free lists */ + int extended_type; /* Extended heap type */ } HEAP;
#define HEAP_MAGIC ((DWORD)('H' | ('E'<<8) | ('A'<<16) | ('P'<<24))) @@ -1511,6 +1512,8 @@ void heap_set_debug_flags( HANDLE handle ) MAX_FREE_PENDING * sizeof(*heap->pending_free) ); heap->pending_pos = 0; } + + HEAP_lfh_set_debug_flags( flags ); }
@@ -1554,11 +1557,13 @@ HANDLE WINAPI RtlCreateHeap( ULONG flags, PVOID addr, SIZE_T totalSize, SIZE_T c HEAP *heapPtr = subheap->heap; RtlEnterCriticalSection( &processHeap->critSection ); list_add_head( &processHeap->entry, &heapPtr->entry ); + heapPtr->extended_type = HEAP_STD; RtlLeaveCriticalSection( &processHeap->critSection ); } else if (!addr) { processHeap = subheap->heap; /* assume the first heap we create is the process main heap */ + processHeap->extended_type = HEAP_STD; list_init( &processHeap->entry ); }
@@ -1644,6 +1649,7 @@ HANDLE WINAPI RtlDestroyHeap( HANDLE heap ) void * WINAPI DECLSPEC_HOTPATCH RtlAllocateHeap( HANDLE heap, ULONG flags, SIZE_T size ) { HEAP *heapPtr; + void *ptr;
if (!(heapPtr = HEAP_GetPtr( heap ))) return NULL; @@ -1651,7 +1657,15 @@ void * WINAPI DECLSPEC_HOTPATCH RtlAllocateHeap( HANDLE heap, ULONG flags, SIZE_ flags &= HEAP_GENERATE_EXCEPTIONS | HEAP_NO_SERIALIZE | HEAP_ZERO_MEMORY; flags |= heapPtr->flags;
- return HEAP_std_allocate( heapPtr, flags, size ); + switch (heapPtr->extended_type) + { + case HEAP_LFH: + if ((ptr = HEAP_lfh_allocate( heapPtr, flags, size ))) + return ptr; + /* fallthrough */ + default: + return HEAP_std_allocate( heapPtr, flags, size ); + } }
void * HEAP_std_allocate( HEAP *heapPtr, ULONG flags, SIZE_T size ) @@ -1750,7 +1764,15 @@ BOOLEAN WINAPI DECLSPEC_HOTPATCH RtlFreeHeap( HANDLE heap, ULONG flags, void *pt flags &= HEAP_NO_SERIALIZE; flags |= heapPtr->flags;
- return HEAP_std_free( heapPtr, flags, ptr ); + switch (heapPtr->extended_type) + { + case HEAP_LFH: + if (HEAP_lfh_validate( heapPtr, flags, ptr )) + return HEAP_lfh_free( heapPtr, flags, ptr ); + /* fallthrough */ + default: + return HEAP_std_free( heapPtr, flags, ptr ); + } }
BOOLEAN HEAP_std_free( HEAP *heapPtr, ULONG flags, void *ptr ) @@ -1818,7 +1840,15 @@ PVOID WINAPI RtlReAllocateHeap( HANDLE heap, ULONG flags, PVOID ptr, SIZE_T size HEAP_REALLOC_IN_PLACE_ONLY; flags |= heapPtr->flags;
- return HEAP_std_reallocate( heapPtr, flags, ptr, size ); + switch (heapPtr->extended_type) + { + case HEAP_LFH: + if (HEAP_lfh_validate( heapPtr, flags, ptr )) + return HEAP_lfh_reallocate( heapPtr, flags, ptr, size ); + /* fallthrough */ + default: + return HEAP_std_reallocate( heapPtr, flags, ptr, size ); + } }
void *HEAP_std_reallocate( HEAP *heapPtr, ULONG flags, void *ptr, SIZE_T size ) @@ -2038,7 +2068,15 @@ SIZE_T WINAPI RtlSizeHeap( HANDLE heap, ULONG flags, const void *ptr ) flags &= HEAP_NO_SERIALIZE; flags |= heapPtr->flags;
- return HEAP_std_get_allocated_size( heapPtr, flags, ptr ); + switch (heapPtr->extended_type) + { + case HEAP_LFH: + if (HEAP_lfh_validate( heapPtr, flags, ptr )) + return HEAP_lfh_get_allocated_size( heapPtr, flags, ptr ); + /* fallthrough */ + default: + return HEAP_std_get_allocated_size( heapPtr, flags, ptr ); + } }
SIZE_T HEAP_std_get_allocated_size( HEAP *heapPtr, ULONG flags, const void *ptr ) @@ -2095,7 +2133,17 @@ BOOLEAN WINAPI RtlValidateHeap( HANDLE heap, ULONG flags, LPCVOID ptr ) flags &= HEAP_NO_SERIALIZE; flags |= heapPtr->flags;
- return HEAP_std_validate( heapPtr, flags, ptr ); + switch (heapPtr->extended_type) + { + case HEAP_LFH: + if (!HEAP_lfh_validate( heapPtr, flags, ptr )) + return FALSE; + /* only fallback to std heap if pointer is NULL or didn't validate */ + if (ptr) return TRUE; + /* fallthrough */ + default: + return HEAP_std_validate( heapPtr, flags, ptr ); + } }
BOOLEAN HEAP_std_validate( HEAP *heapPtr, ULONG flags, const void *ptr ) @@ -2268,6 +2316,13 @@ ULONG WINAPI RtlGetProcessHeaps( ULONG count, HANDLE *heaps ) NTSTATUS WINAPI RtlQueryHeapInformation( HANDLE heap, HEAP_INFORMATION_CLASS info_class, PVOID info, SIZE_T size_in, PSIZE_T size_out) { + HEAP *heapPtr; + + TRACE("%p %d %p %ld\n", heap, info_class, info, size_in); + + if (!(heapPtr = HEAP_GetPtr( heap ))) + return STATUS_INVALID_PARAMETER; + switch (info_class) { case HeapCompatibilityInformation: @@ -2276,7 +2331,7 @@ NTSTATUS WINAPI RtlQueryHeapInformation( HANDLE heap, HEAP_INFORMATION_CLASS inf if (size_in < sizeof(ULONG)) return STATUS_BUFFER_TOO_SMALL;
- *(ULONG *)info = 0; /* standard heap */ + *(ULONG *)info = heapPtr->extended_type; return STATUS_SUCCESS;
default: @@ -2296,4 +2351,5 @@ NTSTATUS WINAPI RtlSetHeapInformation( HANDLE heap, HEAP_INFORMATION_CLASS info_
void HEAP_notify_thread_destroy( BOOLEAN last ) { + HEAP_lfh_notify_thread_destroy( last ); } diff --git a/dlls/ntdll/heap_lfh.c b/dlls/ntdll/heap_lfh.c new file mode 100644 index 000000000000..8dc3a3914839 --- /dev/null +++ b/dlls/ntdll/heap_lfh.c @@ -0,0 +1,55 @@ +/* + * Wine Low Fragmentation Heap + * + * Copyright 2020 Remi Bernon for CodeWeavers + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +#include "ntdll_misc.h" + +void *HEAP_lfh_allocate(struct tagHEAP *std_heap, ULONG flags, SIZE_T size) +{ + return NULL; +} + +BOOLEAN HEAP_lfh_free(struct tagHEAP *std_heap, ULONG flags, void *ptr) +{ + return FALSE; +} + +void *HEAP_lfh_reallocate(struct tagHEAP *std_heap, ULONG flags, void *ptr, SIZE_T size) +{ + return NULL; +} + +SIZE_T HEAP_lfh_get_allocated_size(struct tagHEAP *std_heap, ULONG flags, const void *ptr) +{ + return ~(SIZE_T)0; +} + +BOOLEAN HEAP_lfh_validate(struct tagHEAP *std_heap, ULONG flags, const void *ptr) +{ + if (ptr) return FALSE; + else return TRUE; +} + +void HEAP_lfh_notify_thread_destroy(BOOLEAN last) +{ +} + +void HEAP_lfh_set_debug_flags(ULONG flags) +{ +} diff --git a/dlls/ntdll/ntdll_misc.h b/dlls/ntdll/ntdll_misc.h index 3369eb980a5f..fbf01b729df0 100644 --- a/dlls/ntdll/ntdll_misc.h +++ b/dlls/ntdll/ntdll_misc.h @@ -283,6 +283,10 @@ static inline int get_unix_exit_code( NTSTATUS status ) return status; }
+#define HEAP_STD 0 +#define HEAP_LAL 1 +#define HEAP_LFH 2 + struct tagHEAP; void *HEAP_std_allocate( struct tagHEAP *heap, ULONG flags, SIZE_T size ); BOOLEAN HEAP_std_free( struct tagHEAP *heap, ULONG flags, void *ptr ); @@ -290,7 +294,15 @@ void *HEAP_std_reallocate( struct tagHEAP *heap, ULONG flags, void *ptr, SIZE_ SIZE_T HEAP_std_get_allocated_size( struct tagHEAP *heap, ULONG flags, const void *ptr ); BOOLEAN HEAP_std_validate( struct tagHEAP *heap, ULONG flags, const void *ptr );
+void *HEAP_lfh_allocate( struct tagHEAP *std_heap, ULONG flags, SIZE_T size ); +BOOLEAN HEAP_lfh_free( struct tagHEAP *std_heap, ULONG flags, void *ptr ); +void *HEAP_lfh_reallocate( struct tagHEAP *std_heap, ULONG flags, void *ptr, SIZE_T size ); +SIZE_T HEAP_lfh_get_allocated_size( struct tagHEAP *std_heap, ULONG flags, const void *ptr ); +BOOLEAN HEAP_lfh_validate( struct tagHEAP *std_heap, ULONG flags, const void *ptr ); + void HEAP_notify_thread_destroy( BOOLEAN last ); +void HEAP_lfh_notify_thread_destroy( BOOLEAN last ); +void HEAP_lfh_set_debug_flags( ULONG flags );
extern mode_t FILE_umask DECLSPEC_HIDDEN; extern HANDLE keyed_event DECLSPEC_HIDDEN;
--- dlls/kernel32/tests/heap.c | 2 -- dlls/ntdll/heap.c | 32 ++++++++++++++++++++++++++++++-- 2 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/dlls/kernel32/tests/heap.c b/dlls/kernel32/tests/heap.c index 550d7190f2ca..6e468536dd75 100644 --- a/dlls/kernel32/tests/heap.c +++ b/dlls/kernel32/tests/heap.c @@ -656,10 +656,8 @@ static void test_HeapCreate(void)
hci = 1; SetLastError(0xdeadbeef); - todo_wine ok(!pHeapSetInformation(heap, HeapCompatibilityInformation, &hci, sizeof(hci)), "HeapSetInformation succeeded\n"); - todo_wine ok(GetLastError() == ERROR_GEN_FAILURE, "expected ERROR_GEN_FAILURE, got %u\n", GetLastError());
diff --git a/dlls/ntdll/heap.c b/dlls/ntdll/heap.c index ca3d05450e71..a80bc56ccb52 100644 --- a/dlls/ntdll/heap.c +++ b/dlls/ntdll/heap.c @@ -2345,8 +2345,36 @@ NTSTATUS WINAPI RtlQueryHeapInformation( HANDLE heap, HEAP_INFORMATION_CLASS inf */ NTSTATUS WINAPI RtlSetHeapInformation( HANDLE heap, HEAP_INFORMATION_CLASS info_class, PVOID info, SIZE_T size) { - FIXME("%p %d %p %ld stub\n", heap, info_class, info, size); - return STATUS_SUCCESS; + HEAP *heapPtr; + + TRACE("%p %d %p %ld stub\n", heap, info_class, info, size); + + if (!(heapPtr = HEAP_GetPtr( heap ))) + return STATUS_INVALID_PARAMETER; + + switch (info_class) + { + case HeapCompatibilityInformation: + if (size < sizeof(ULONG)) + return STATUS_BUFFER_TOO_SMALL; + + if (heapPtr->extended_type != HEAP_STD) + return STATUS_UNSUCCESSFUL; + + if (*(ULONG *)info != HEAP_STD && + *(ULONG *)info != HEAP_LFH) + { + FIXME("unimplemented HeapCompatibilityInformation %d\n", *(ULONG *)info); + return STATUS_SUCCESS; + } + + heapPtr->extended_type = *(ULONG *)info; + return STATUS_SUCCESS; + + default: + FIXME("Unknown heap information class %u\n", info_class); + return STATUS_INVALID_INFO_CLASS; + } }
void HEAP_notify_thread_destroy( BOOLEAN last )
--- dlls/ntdll/heap.c | 6 ------ dlls/ntdll/ntdll_misc.h | 6 ++++++ 2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/dlls/ntdll/heap.c b/dlls/ntdll/heap.c index a80bc56ccb52..6b5fc45d381e 100644 --- a/dlls/ntdll/heap.c +++ b/dlls/ntdll/heap.c @@ -173,12 +173,6 @@ typedef struct tagHEAP #define COMMIT_MASK 0xffff /* bitmask for commit/decommit granularity */ #define MAX_FREE_PENDING 1024 /* max number of free requests to delay */
-/* some undocumented flags (names are made up) */ -#define HEAP_PAGE_ALLOCS 0x01000000 -#define HEAP_VALIDATE 0x10000000 -#define HEAP_VALIDATE_ALL 0x20000000 -#define HEAP_VALIDATE_PARAMS 0x40000000 - static HEAP *processHeap; /* main process heap */
static BOOL HEAP_IsRealArena( HEAP *heapPtr, DWORD flags, LPCVOID block, BOOL quiet ); diff --git a/dlls/ntdll/ntdll_misc.h b/dlls/ntdll/ntdll_misc.h index fbf01b729df0..4dd642cd7842 100644 --- a/dlls/ntdll/ntdll_misc.h +++ b/dlls/ntdll/ntdll_misc.h @@ -287,6 +287,12 @@ static inline int get_unix_exit_code( NTSTATUS status ) #define HEAP_LAL 1 #define HEAP_LFH 2
+/* some undocumented flags (names are made up) */ +#define HEAP_PAGE_ALLOCS 0x01000000 +#define HEAP_VALIDATE 0x10000000 +#define HEAP_VALIDATE_ALL 0x20000000 +#define HEAP_VALIDATE_PARAMS 0x40000000 + struct tagHEAP; void *HEAP_std_allocate( struct tagHEAP *heap, ULONG flags, SIZE_T size ); BOOLEAN HEAP_std_free( struct tagHEAP *heap, ULONG flags, void *ptr );
--- dlls/ntdll/heap.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/dlls/ntdll/heap.c b/dlls/ntdll/heap.c index 6b5fc45d381e..f3c378a927d2 100644 --- a/dlls/ntdll/heap.c +++ b/dlls/ntdll/heap.c @@ -2339,8 +2339,12 @@ NTSTATUS WINAPI RtlQueryHeapInformation( HANDLE heap, HEAP_INFORMATION_CLASS inf */ NTSTATUS WINAPI RtlSetHeapInformation( HANDLE heap, HEAP_INFORMATION_CLASS info_class, PVOID info, SIZE_T size) { + static int enable_winelfh = -1; HEAP *heapPtr;
+ if (enable_winelfh == -1 && (enable_winelfh = getenv("WINELFH") && atoi(getenv("WINELFH")))) + MESSAGE("wine: enabling low fragmentation heaps\n"); + TRACE("%p %d %p %ld stub\n", heap, info_class, info, size);
if (!(heapPtr = HEAP_GetPtr( heap ))) @@ -2362,6 +2366,9 @@ NTSTATUS WINAPI RtlSetHeapInformation( HANDLE heap, HEAP_INFORMATION_CLASS info_ return STATUS_SUCCESS; }
+ if (!enable_winelfh) + return STATUS_SUCCESS; + heapPtr->extended_type = *(ULONG *)info; return STATUS_SUCCESS;
Hi,
While running your changed tests, I think I found new failures. Being a bot and all I'm not very good at pattern recognition, so I might be wrong, but could you please double-check?
Full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=71293
Your paranoid android.
=== debiant (32 bit report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (32 bit Chinese:China report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (32 bit WoW report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (64 bit WoW report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
This is a high performance multithreaded heap implementation that tries to minimize memory fragmentation as well.
It takes inspiration from rpmalloc / tcmalloc and other thread-local heap implementations, while avoiding the complexity of a cache.
The low fragmentation part is achieved by using two layers of pools, or arenas, classified by block size:
* The first, coarse grained, pools are called "large" arenas, and are allocated directly by mapping 4MiB of virtual memory for each pool, which is then split into blocks of fixed size. The large arena classes are configured to support block sizes in a range from (64KiB - hs) up to (2MiB - hs), increasing by 64KiB steps, where hs is the arena header size.
* The second pool layer, called "small" and "medium" arenas is built on top of the first, using the exact same mechanism (and code). Each pool is allocated by acquiring a block of (64KiB - hs) size from an arena of the first "large" class. The "small" arena classes are configured for block sizes in a range from 32 to 2048 bytes, increasing by 32B steps. The "medium" arena classes are configured for block sizes in a range from 2048 bytes up to ((64KiB - hs) - hs) / 2, increasing by 512B steps.
Any memory allocation that is bigger than what "large" arenas can provide will be directly mapped from virtual memory.
The multithreaded part is achieved by keeping thread local heap structures to hold the currently allocated classified arenas:
* Whenever a thread needs it, a new thread local heap will be acquired from a global orphan list - using an interlocked singly linked list of unused heaps - or allocated from virtual memory. Whenever a thread terminates, it will release its thread local heap to the global orphan list.
* Every alloc is done by using the current thread heap, and by allocating a new block from its arenas. The virtual memory mapping that may eventually be called is already thread safe and does not require additional locking.
* Every free is deferred to the thread that allocated the block, by using an interlocked singly linked list.
* Every time a thread allocates a new block, it will first cleanup its deferred free block list.
The thread local heaps may not be always associated with an live thread, so this means that deferred blocks may have to wait for the orphan heap to be adopted by a new thread before they are actually released. --- dlls/ntdll/heap_lfh.c | 1006 ++++++++++++++++++++++++++++++++++++++- dlls/ntdll/ntdll_misc.h | 1 + 2 files changed, 1005 insertions(+), 2 deletions(-)
diff --git a/dlls/ntdll/heap_lfh.c b/dlls/ntdll/heap_lfh.c index 8dc3a3914839..0900deae2b5d 100644 --- a/dlls/ntdll/heap_lfh.c +++ b/dlls/ntdll/heap_lfh.c @@ -18,38 +18,1040 @@ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA */
+#include "ntstatus.h" +#define WIN32_NO_STATUS #include "ntdll_misc.h" +#include "wine/list.h" +#include "wine/debug.h" + +WINE_DEFAULT_DEBUG_CHANNEL(heap); + +typedef struct LFH_ptr LFH_ptr; +typedef struct LFH_block LFH_block; +typedef enum LFH_block_type LFH_block_type; +typedef struct LFH_arena LFH_arena; +typedef struct LFH_class LFH_class; +typedef struct LFH_heap LFH_heap; +typedef struct LFH_slist LFH_slist; + +#define ARENA_HEADER_SIZE (sizeof(LFH_arena)) + +#define LARGE_ARENA_SIZE 0x400000 /* 4MiB */ +#define LARGE_ARENA_MASK (LARGE_ARENA_SIZE - 1) + +#define BLOCK_ARENA_SIZE 0x10000 /* 64kiB */ +#define BLOCK_ARENA_MASK (BLOCK_ARENA_SIZE - 1) + +#define SMALL_CLASS_STEP 0x20 +#define SMALL_CLASS_MASK (SMALL_CLASS_STEP - 1) +#define SMALL_CLASS_MIN_SIZE SMALL_CLASS_STEP +#define SMALL_CLASS_MAX_SIZE 0x800 +#define SMALL_CLASS_COUNT ((SMALL_CLASS_MAX_SIZE - SMALL_CLASS_MIN_SIZE) / SMALL_CLASS_STEP + 1) +#define SMALL_CLASS_FIRST 0 +#define SMALL_CLASS_LAST (SMALL_CLASS_FIRST + SMALL_CLASS_COUNT - 1) + +#define MEDIUM_CLASS_STEP (16 * SMALL_CLASS_STEP) +#define MEDIUM_CLASS_MASK (MEDIUM_CLASS_STEP - 1) +#define MEDIUM_CLASS_MIN_SIZE SMALL_CLASS_MAX_SIZE +#define MEDIUM_CLASS_MAX_SIZE ((BLOCK_ARENA_SIZE - ARENA_HEADER_SIZE - ARENA_HEADER_SIZE) / 2) +#define MEDIUM_CLASS_COUNT ((MEDIUM_CLASS_MAX_SIZE - MEDIUM_CLASS_MIN_SIZE + MEDIUM_CLASS_MASK) / MEDIUM_CLASS_STEP + 1) +#define MEDIUM_CLASS_FIRST (SMALL_CLASS_LAST + 1) +#define MEDIUM_CLASS_LAST (MEDIUM_CLASS_FIRST + MEDIUM_CLASS_COUNT - 1) + +#define LARGE_CLASS_STEP BLOCK_ARENA_SIZE +#define LARGE_CLASS_MASK (LARGE_CLASS_STEP - 1) +#define LARGE_CLASS_MIN_SIZE (BLOCK_ARENA_SIZE - ARENA_HEADER_SIZE) +#define LARGE_CLASS_MAX_SIZE (LARGE_ARENA_SIZE / 2 - ARENA_HEADER_SIZE) /* we need an arena header for every large block */ +#define LARGE_CLASS_COUNT ((LARGE_CLASS_MAX_SIZE - LARGE_CLASS_MIN_SIZE) / LARGE_CLASS_STEP + 1) +#define LARGE_CLASS_FIRST 0 +#define LARGE_CLASS_LAST (LARGE_CLASS_FIRST + LARGE_CLASS_COUNT - 1) + +#define TOTAL_BLOCK_CLASS_COUNT (MEDIUM_CLASS_LAST + 1) +#define TOTAL_LARGE_CLASS_COUNT (LARGE_CLASS_LAST + 1) + +struct LFH_slist +{ + LFH_slist *next; +}; + +static void LFH_slist_push(LFH_slist **list, LFH_slist *entry) +{ + /* There will be no ABA issue here, other threads can only replace + * list->next with a different entry, or NULL. */ + entry->next = __atomic_load_n(list, __ATOMIC_RELAXED); + while (!__atomic_compare_exchange_n(list, &entry->next, entry, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE)); +} + +static LFH_slist *LFH_slist_flush(LFH_slist **list) +{ + if (!__atomic_load_n(list, __ATOMIC_RELAXED)) return NULL; + return __atomic_exchange_n(list, NULL, __ATOMIC_ACQUIRE); +} + +/* be sure to keep these different from ARENA_INUSE magic */ +enum LFH_block_type +{ + LFH_block_type_used = 0xa55a5aa5a55a5aa5ul, + LFH_block_type_free = 0xc33c3cc3c33c3cc3ul, +}; + +struct DECLSPEC_ALIGN(16) LFH_block +{ + union + { + LFH_block *entry_free; + LFH_slist entry_defer; + size_t alloc_size; + }; + + LFH_block_type type; +}; + +C_ASSERT(sizeof(LFH_block) == 0x10); +C_ASSERT(offsetof(LFH_block, entry_defer) == 0); + +struct DECLSPEC_ALIGN(16) LFH_arena +{ + LFH_block *list_free; + LFH_arena *class_entry; + + union + { + LFH_arena *parent; + LFH_class *class; + }; + + union + { + size_t huge_size; + size_t used_count; + }; +}; + +#ifdef _WIN64 +C_ASSERT(sizeof(LFH_arena) == 0x20); +#else +C_ASSERT(sizeof(LFH_arena) == 0x10); +#endif + +struct LFH_class +{ + LFH_arena *next; + size_t size; +}; + +struct LFH_heap +{ + LFH_slist *list_defer; + + LFH_class block_class[TOTAL_BLOCK_CLASS_COUNT]; + LFH_class large_class[TOTAL_LARGE_CLASS_COUNT]; + + SLIST_ENTRY entry_orphan; +}; + +C_ASSERT(TOTAL_BLOCK_CLASS_COUNT == 0x7d); +C_ASSERT(TOTAL_LARGE_CLASS_COUNT == 0x20); + +/* arena->class/arena->parent pointer low bits are used to discriminate between the two */ +C_ASSERT(offsetof(LFH_heap, block_class[0]) > 0); +C_ASSERT(offsetof(LFH_heap, large_class[TOTAL_LARGE_CLASS_COUNT]) < BLOCK_ARENA_SIZE); + +/* helpers to retrieve parent arena from a child, or class pointer from a large or block arena */ +#define LFH_parent_from_arena(arena) (((arena)->parent && !((UINT_PTR)(arena)->parent & BLOCK_ARENA_MASK)) \ + ? (arena)->parent : (arena)) +#define LFH_class_from_arena(arena) (((UINT_PTR)LFH_parent_from_arena(arena)->class & BLOCK_ARENA_MASK) \ + ? LFH_parent_from_arena(arena)->class : NULL) + +/* helper to retrieve the heap from an arena, using its class pointer */ +#define LFH_heap_from_arena(arena) ((LFH_heap *)((UINT_PTR)LFH_class_from_arena(arena) & ~BLOCK_ARENA_MASK)) + +/* helpers to retrieve block pointers to the containing block or large (maybe child) arena */ +#define LFH_large_arena_from_block(block) ((LFH_arena *)((UINT_PTR)(block) & ~BLOCK_ARENA_MASK)) +#define LFH_block_arena_from_block(block) (LFH_large_arena_from_block(block) + 1) +#define LFH_arena_from_block(block) (LFH_block_arena_from_block(block) == ((LFH_arena *)(block)) \ + ? LFH_large_arena_from_block(block) : LFH_block_arena_from_block(block)) + +/* helpers to translate between data pointer and LFH_block header */ +#define LFH_block_from_ptr(ptr) ((LFH_block *)(ptr) - 1) +#define LFH_ptr_from_block(block) ((LFH_ptr *)((block) + 1)) + +static size_t LFH_block_get_class_size(const LFH_block *block) +{ + const LFH_arena *arena = LFH_arena_from_block(block); + const LFH_class *class = LFH_class_from_arena(arena); + return class ? class->size : arena->huge_size; +} + +static size_t LFH_block_get_alloc_size(const LFH_block *block, ULONG flags) +{ + return block->alloc_size; +} + +static size_t LFH_get_class_size(ULONG flags, size_t size) +{ + size_t extra = sizeof(LFH_block) + ((flags & HEAP_TAIL_CHECKING_ENABLED) ? 16 : 0); + if (size + extra < size) return ~(size_t)0; + return size + extra; +} + +static void *LFH_memory_allocate(size_t size) +{ + void *addr = NULL; + SIZE_T alloc_size = size; + + if (NtAllocateVirtualMemory(NtCurrentProcess(), (void **)&addr, 0, &alloc_size, + MEM_RESERVE | MEM_COMMIT | MEM_TOP_DOWN, PAGE_READWRITE)) + return NULL; + + return addr; +} + +static BOOLEAN LFH_memory_deallocate(void *addr, size_t size) +{ + SIZE_T release_size = 0; + + if (NtFreeVirtualMemory(NtCurrentProcess(), &addr, &release_size, MEM_RELEASE)) + return FALSE; + + return TRUE; +} + +static LFH_block *LFH_arena_get_block(LFH_arena *arena, size_t offset) +{ + return (LFH_block *)((UINT_PTR)arena + offset); +} + +static void LFH_arena_push_block(LFH_arena *arena, LFH_block *block) +{ + block->type = LFH_block_type_free; + block->entry_free = arena->list_free; + arena->list_free = block; + arena->used_count--; +} + +static LFH_block *LFH_arena_pop_block(LFH_arena *arena) +{ + LFH_block *block = arena->list_free; + arena->list_free = block->entry_free; + arena->used_count++; + return block; +} + +static int LFH_arena_is_empty(LFH_arena *arena) +{ + return arena->list_free == NULL; +} + +static int LFH_arena_is_used(LFH_arena *arena) +{ + return arena->used_count > 0; +} + +static int LFH_class_is_block(LFH_heap *heap, LFH_class *class) +{ + return class >= heap->block_class && class < (heap->block_class + TOTAL_BLOCK_CLASS_COUNT); +} + +static void LFH_class_initialize(LFH_heap *heap, LFH_class *class, size_t index) +{ + class->next = NULL; + + if (LFH_class_is_block(heap, class)) + { + if (index <= SMALL_CLASS_LAST) + class->size = min(SMALL_CLASS_MIN_SIZE + SMALL_CLASS_STEP * (index - SMALL_CLASS_FIRST), SMALL_CLASS_MAX_SIZE); + else + class->size = min(MEDIUM_CLASS_MIN_SIZE + MEDIUM_CLASS_STEP * (index - MEDIUM_CLASS_FIRST), MEDIUM_CLASS_MAX_SIZE); + } + else + { + class->size = min(LARGE_CLASS_MIN_SIZE + LARGE_CLASS_STEP * (index - LARGE_CLASS_FIRST), LARGE_CLASS_MAX_SIZE); + } +} + +static LFH_arena *LFH_class_pop_arena(LFH_class *class) +{ + LFH_arena *arena = class->next; + if (!arena) return NULL; + class->next = arena->class_entry; + return arena; +} + +static void LFH_class_remove_arena(LFH_class *class, LFH_arena *arena) +{ + LFH_arena **next = &class->next; + while (*next != arena) next = &(*next)->class_entry; + *next = arena->class_entry; +} + +static LFH_arena *LFH_class_peek_arena(LFH_class *class) +{ + return class->next; +} + +static void LFH_class_push_arena(LFH_class *class, LFH_arena *arena) +{ + arena->class_entry = class->next; + class->next = arena; +} + +static LFH_class *LFH_heap_get_class(LFH_heap *heap, size_t size) +{ + if (size == 0) + return &heap->block_class[0]; + else if (size <= SMALL_CLASS_MAX_SIZE) + return &heap->block_class[SMALL_CLASS_FIRST + (size + SMALL_CLASS_MASK - SMALL_CLASS_MIN_SIZE) / SMALL_CLASS_STEP]; + else if (size <= MEDIUM_CLASS_MAX_SIZE) + return &heap->block_class[MEDIUM_CLASS_FIRST + (size + MEDIUM_CLASS_MASK - MEDIUM_CLASS_MIN_SIZE) / MEDIUM_CLASS_STEP]; + else if (size <= LARGE_CLASS_MAX_SIZE) + return &heap->large_class[LARGE_CLASS_FIRST + (size + LARGE_CLASS_MASK - LARGE_CLASS_MIN_SIZE) / LARGE_CLASS_STEP]; + else + return NULL; +} + +static void LFH_arena_initialize(LFH_heap *heap, LFH_class *class, LFH_arena *arena, LFH_arena *parent, size_t huge_size) +{ + LFH_arena *large_arena = LFH_large_arena_from_block(arena); + size_t offset; + + arena->parent = NULL; + arena->class = class; + arena->list_free = NULL; + + if (class == NULL) + { + arena->huge_size = huge_size; + } + else if (parent != NULL) + { + arena->parent = parent; + } + else if (large_arena == arena) + { + for (offset = ARENA_HEADER_SIZE; offset + class->size <= LARGE_ARENA_SIZE; offset += (ARENA_HEADER_SIZE + class->size)) + { + LFH_block *block = LFH_arena_get_block(arena, offset); + LFH_arena *child = LFH_large_arena_from_block(block); + if (child != arena) + LFH_arena_initialize(heap, class, child, arena, 0); + LFH_arena_push_block(arena, block); + } + arena->used_count = 0; + } + else + { + LFH_class *large_class = LFH_class_from_arena(large_arena); + + for (offset = ARENA_HEADER_SIZE; offset + class->size <= large_class->size; offset += class->size) + { + LFH_block *block = LFH_arena_get_block(arena, offset); + LFH_arena_push_block(arena, block); + } + arena->used_count = 0; + } +} + +static LFH_arena *LFH_acquire_arena(LFH_heap *heap, LFH_class *class); +static BOOLEAN LFH_release_arena(LFH_heap *heap, LFH_arena *arena); + +static LFH_block *LFH_allocate_block(LFH_heap *heap, LFH_class *class, LFH_arena *arena); +static BOOLEAN LFH_deallocate_block(LFH_heap *heap, LFH_arena *arena, LFH_block *block); + +static BOOLEAN LFH_deallocate_deferred_blocks(LFH_heap *heap) +{ + LFH_slist *entry = LFH_slist_flush(&heap->list_defer); + + while (entry) + { + LFH_block *block = LIST_ENTRY(entry, LFH_block, entry_defer); + entry = entry->next; + + if (!LFH_deallocate_block(heap, LFH_arena_from_block(block), block)) + return FALSE; + } + + return TRUE; +} + +static LFH_arena *LFH_allocate_huge_arena(LFH_heap *heap, size_t size) +{ + LFH_arena *arena; + size_t alloc_size = (ARENA_HEADER_SIZE + size + BLOCK_ARENA_MASK) & ~BLOCK_ARENA_MASK; + if (alloc_size < size) return NULL; + + if ((arena = LFH_memory_allocate(alloc_size))) + LFH_arena_initialize(heap, NULL, arena, NULL, size); + + return arena; +} + +static LFH_arena *LFH_allocate_large_arena(LFH_heap *heap, LFH_class *class) +{ + LFH_arena *arena; + + if ((arena = LFH_memory_allocate(LARGE_ARENA_SIZE))) + { + LFH_arena_initialize(heap, class, arena, NULL, 0); + LFH_class_push_arena(class, arena); + } + + return arena; +} + +static LFH_arena *LFH_allocate_block_arena(LFH_heap *heap, LFH_class *large_class, LFH_class *block_class) +{ + LFH_arena *large_arena; + LFH_arena *arena = NULL; + + if ((large_arena = LFH_acquire_arena(heap, large_class))) + { + arena = (LFH_arena *)LFH_allocate_block(heap, large_class, large_arena); + LFH_arena_initialize(heap, block_class, arena, NULL, 0); + LFH_class_push_arena(block_class, arena); + } + + return arena; +} + +static LFH_arena *LFH_acquire_arena(LFH_heap *heap, LFH_class *class) +{ + LFH_arena *arena; + + if (!(arena = LFH_class_peek_arena(class))) + { + if (LFH_class_is_block(heap, class)) + arena = LFH_allocate_block_arena(heap, &heap->large_class[0], class); + else + arena = LFH_allocate_large_arena(heap, class); + } + + return arena; +} + +static BOOLEAN LFH_release_arena(LFH_heap *heap, LFH_arena *arena) +{ + LFH_arena *large_arena = LFH_large_arena_from_block(arena); + if (arena == large_arena) + return LFH_memory_deallocate(arena, LARGE_ARENA_SIZE); + else + return LFH_deallocate_block(heap, large_arena, (LFH_block *)arena); +}; + +static LFH_block *LFH_allocate_block(LFH_heap *heap, LFH_class *class, LFH_arena *arena) +{ + LFH_block *block = LFH_arena_pop_block(arena); + if (LFH_arena_is_empty(arena)) + LFH_class_pop_arena(class); + return block; +} + +static BOOLEAN LFH_deallocate_block(LFH_heap *heap, LFH_arena *arena, LFH_block *block) +{ + LFH_class *class = LFH_class_from_arena(arena); + + arena = LFH_parent_from_arena(arena); + if (LFH_arena_is_empty(arena)) + LFH_class_push_arena(class, arena); + + LFH_arena_push_block(arena, block); + if (LFH_arena_is_used(arena)) + return TRUE; + + LFH_class_remove_arena(class, arena); + return LFH_release_arena(heap, arena); +} + +static SLIST_HEADER *LFH_orphan_list(void) +{ + static SLIST_HEADER *header; + SLIST_HEADER *ptr, *epxected = NULL; + + if ((ptr = __atomic_load_n(&header, __ATOMIC_RELAXED))) + return ptr; + + if (!(ptr = LFH_memory_allocate(sizeof(*header)))) + return NULL; + + RtlInitializeSListHead(ptr); + if (__atomic_compare_exchange_n(&header, &epxected, ptr, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE)) + return ptr; + + LFH_memory_deallocate(ptr, sizeof(*header)); + return epxected; +} + +static void LFH_heap_initialize(LFH_heap *heap) +{ + size_t i; + + for (i = 0; i < TOTAL_LARGE_CLASS_COUNT; ++i) + LFH_class_initialize(heap, &heap->large_class[i], i); + for (i = 0; i < TOTAL_BLOCK_CLASS_COUNT; ++i) + LFH_class_initialize(heap, &heap->block_class[i], i); + + heap->list_defer = NULL; +} + +static void LFH_heap_finalize(LFH_heap *heap) +{ + LFH_arena *arena; + + LFH_deallocate_deferred_blocks(heap); + + for (size_t i = 0; i < TOTAL_BLOCK_CLASS_COUNT; ++i) + { + while ((arena = LFH_class_pop_arena(&heap->block_class[i]))) + { + WARN("block arena %p still has used blocks\n", arena); + LFH_release_arena(heap, arena); + } + } + + for (size_t i = 0; i < TOTAL_LARGE_CLASS_COUNT; ++i) + { + while ((arena = LFH_class_pop_arena(&heap->large_class[i]))) + { + WARN("large arena %p still has used blocks\n", arena); + LFH_memory_deallocate(arena, LARGE_ARENA_SIZE); + } + } +} + +static LFH_heap *LFH_heap_allocate(void) +{ + void *addr; + addr = LFH_memory_allocate(sizeof(LFH_heap)); + if (!addr) + return NULL; + + LFH_heap_initialize(addr); + return addr; +} + +static void LFH_heap_deallocate(LFH_heap *heap) +{ + LFH_heap_finalize(heap); + LFH_memory_deallocate(heap, sizeof(*heap)); +} + +static LFH_heap *LFH_thread_heap(BOOL create) +{ + SLIST_ENTRY *entry; + + LFH_heap *heap = ntdll_get_thread_data()->heap; + if (!heap && create) + { + if ((entry = RtlInterlockedPopEntrySList(LFH_orphan_list()))) + heap = LIST_ENTRY(entry, LFH_heap, entry_orphan); + else + heap = LFH_heap_allocate(); + + ntdll_get_thread_data()->heap = heap; + } + + return heap; +} + +static void LFH_dump_arena(LFH_heap *heap, LFH_class *class, LFH_arena *arena) +{ + LFH_arena *large_arena = LFH_large_arena_from_block(arena); + LFH_arena *block_arena = LFH_block_arena_from_block(arena); + + if (arena == block_arena) + WARN(" block arena: %p-%p", arena, (void *)((UINT_PTR)large_arena + BLOCK_ARENA_SIZE - 1)); + else if (arena == large_arena) + WARN(" large arena: %p-%p", arena, (void *)((UINT_PTR)large_arena + LARGE_ARENA_SIZE - 1)); + + WARN(" heap: %p class: %p parent: %p free: %p used: %zd\n", + LFH_heap_from_arena(arena), LFH_class_from_arena(arena), LFH_parent_from_arena(arena), arena->list_free, arena->used_count); +} + +static void LFH_dump_class(LFH_heap *heap, LFH_class *class) +{ + LFH_arena *arena = class->next; + if (!arena) return; + + if (LFH_class_is_block(heap, class)) + WARN(" block class: %p size: %zx\n", class, class->size); + else + WARN(" large class: %p size: %zx\n", class, class->size); + + while (arena) + { + LFH_dump_arena(heap, class, arena); + arena = arena->class_entry; + } +} + +static void LFH_dump_heap(LFH_heap *heap) +{ + size_t i; + + WARN("heap: %p\n", heap); + + for (i = 0; i < TOTAL_BLOCK_CLASS_COUNT; ++i) + LFH_dump_class(heap, &heap->block_class[i]); + + for (i = 0; i < TOTAL_LARGE_CLASS_COUNT; ++i) + LFH_dump_class(heap, &heap->large_class[i]); +} + +static BOOLEAN LFH_validate_block(ULONG flags, const LFH_block *block); +static BOOLEAN LFH_validate_arena(ULONG flags, const LFH_arena *arena); +static BOOLEAN LFH_validate_heap(ULONG flags, const LFH_heap *heap); + +static BOOLEAN LFH_validate_block(ULONG flags, const LFH_block *block) +{ + const LFH_arena *arena = LFH_arena_from_block(block); + const LFH_arena *large_arena = LFH_large_arena_from_block(block); + const LFH_arena *block_arena = LFH_block_arena_from_block(block); + const char *err = NULL; + + if (flags & HEAP_VALIDATE) + return LFH_validate_arena(flags, arena); + + if (!arena) + err = "invalid arena"; + else if (arena != LFH_large_arena_from_block(arena) && + arena != (LFH_large_arena_from_block(arena) + 1)) + err = "invalid arena alignment"; + else if (arena == block_arena) + { + if ((UINT_PTR)block < (UINT_PTR)block_arena + ARENA_HEADER_SIZE || + ((UINT_PTR)block & (sizeof(*block) - 1)) != 0) + err = "invalid block alignment"; + else if (!LFH_class_from_arena(block_arena)) + err = "invalid block arena class"; + } + else + { + if (arena != large_arena) + err = "large/huge arena mismatch"; + else if ((UINT_PTR)block != (UINT_PTR)block_arena) + err = "invalid block for large/huge arena"; + } + + if (err) WARN("%08x %p: %s\n", flags, block, err); + return err == NULL; +} + +static BOOLEAN LFH_validate_free_block(ULONG flags, const LFH_block *block) +{ + const char *err = NULL; + + if (!LFH_validate_block(flags, block)) + return FALSE; + if (block->type != LFH_block_type_free) + err = "invalid free block type"; + + if (err) WARN("%08x %p: %s\n", flags, block, err); + return err == NULL; +} + +static BOOLEAN LFH_validate_defer_block(ULONG flags, const LFH_block *block) +{ + const char *err = NULL; + + if (!LFH_validate_block(flags, block)) + return FALSE; + if (block->type != LFH_block_type_free) + err = "invalid defer block type"; + else if (flags & HEAP_FREE_CHECKING_ENABLED) + { + const unsigned int *data = (const unsigned int *)LFH_ptr_from_block(block); + size_t class_size = LFH_block_get_class_size(block); + for (size_t i = 0; i < class_size / 4 - (data - (const unsigned int *)block) && !err; ++i) + if (data[i] != 0xfeeefeee) err = "invalid free filler"; + } + + if (err) WARN("%08x %p: %s\n", flags, block, err); + return err == NULL; +} + +static BOOLEAN LFH_validate_used_block(ULONG flags, const LFH_block *block) +{ + const char *err = NULL; + + if (!LFH_validate_block(flags, block)) + return FALSE; + if (block->type != LFH_block_type_used) + err = "invalid used block type"; + else if (flags & HEAP_TAIL_CHECKING_ENABLED) + { + const unsigned char *data = (const unsigned char *)LFH_ptr_from_block(block); + size_t alloc_size = LFH_block_get_alloc_size(block, flags); + size_t class_size = LFH_block_get_class_size(block); + for (size_t i = alloc_size; i < class_size - (data - (const unsigned char *)block) && !err; ++i) + if (data[i] != 0xab) err = "invalid tail filler"; + } + + if (err) WARN("%08x %p: %s\n", flags, block, err); + return err == NULL; +} + +static BOOLEAN LFH_validate_arena_free_blocks(ULONG flags, const LFH_arena *arena) +{ + LFH_block *block = arena->list_free; + while (block) + { + if (!LFH_validate_free_block(flags, block)) + return FALSE; + + block = block->entry_free; + } + + return TRUE; +} + +static BOOLEAN LFH_validate_arena(ULONG flags, const LFH_arena *arena) +{ + const char *err = NULL; + const LFH_arena *parent; + + if (flags & HEAP_VALIDATE) + return LFH_validate_heap(flags, LFH_heap_from_arena(arena)); + + if (arena != LFH_large_arena_from_block(arena) && + arena != (LFH_large_arena_from_block(arena) + 1)) + err = "invalid arena alignment"; + else if (arena == LFH_block_arena_from_block(arena)) + { + if (!LFH_validate_block(flags, (LFH_block *)arena)) + err = "invalid block arena"; + else if (!LFH_validate_arena_free_blocks(flags, arena)) + err = "invalid block arena free list"; + } + else if (arena == LFH_large_arena_from_block(arena) && !LFH_class_from_arena(arena)) + { + if (arena->huge_size <= LARGE_CLASS_MAX_SIZE) + err = "invalid huge arena size"; + else if (arena->list_free) + err = "invalid huge arena"; + } + else if (arena == LFH_large_arena_from_block(arena) && + (parent = LFH_parent_from_arena(arena)) != arena) + { + if (arena > parent || LFH_large_arena_from_block(parent) != parent) + err = "invalid child arena parent"; + else if (arena->list_free) + err = "invalid child arena"; + } + else + { + if (!LFH_validate_arena_free_blocks(flags, arena)) + err = "invalid large arena free list"; + } + + if (err) WARN("%08x %p: %s\n", flags, arena, err); + return err == NULL; +} + +static BOOLEAN LFH_validate_class_arenas(ULONG flags, const LFH_class *class) +{ + LFH_arena *arena = class->next; + while (arena) + { + if (!LFH_validate_arena(flags, arena)) + return FALSE; + + arena = arena->class_entry; + } + + return TRUE; +} + +static BOOLEAN LFH_validate_heap_defer_blocks(ULONG flags, const LFH_heap *heap) +{ + const LFH_slist *entry = heap->list_defer; + + while (entry) + { + const LFH_block *block = LIST_ENTRY(entry, LFH_block, entry_defer); + if (!LFH_validate_defer_block(flags, block)) + return FALSE; + entry = entry->next; + } + + return TRUE; +} + +static BOOLEAN LFH_validate_heap(ULONG flags, const LFH_heap *heap) +{ + const char *err = NULL; + UINT i; + + flags &= ~HEAP_VALIDATE; + + if (heap != LFH_thread_heap(FALSE)) + err = "unable to validate foreign heap"; + else if (!LFH_validate_heap_defer_blocks(flags, heap)) + err = "invalid heap defer blocks"; + else + { + for (i = 0; err == NULL && i < TOTAL_BLOCK_CLASS_COUNT; ++i) + { + if (!LFH_validate_class_arenas(flags, &heap->block_class[i])) + return FALSE; + } + + for (i = 0; err == NULL && i < TOTAL_LARGE_CLASS_COUNT; ++i) + { + if (!LFH_validate_class_arenas(flags, &heap->large_class[i])) + return FALSE; + } + } + + if (err) WARN("%08x %p: %s\n", flags, heap, err); + return err == NULL; +} + +static void LFH_block_initialize(LFH_block *block, ULONG flags, size_t old_size, size_t new_size, size_t class_size) +{ + char *data = (char *)LFH_ptr_from_block(block); + + TRACE("%p %x %zx %zx %zx, %p\n", block, flags, old_size, new_size, class_size, data); + + if ((flags & HEAP_ZERO_MEMORY) && new_size > old_size) + memset(data + old_size, 0, new_size - old_size); + else if ((flags & HEAP_FREE_CHECKING_ENABLED) && new_size > old_size && class_size < BLOCK_ARENA_SIZE) + memset(data + old_size, 0x55, new_size - old_size); + + if ((flags & HEAP_TAIL_CHECKING_ENABLED)) + memset(data + new_size, 0xab, class_size - new_size - (data - (char *)block)); + + block->type = LFH_block_type_used; + block->alloc_size = new_size; +} + +static LFH_ptr *LFH_allocate(ULONG flags, size_t size) +{ + LFH_block *block = NULL; + LFH_class *class; + LFH_arena *arena; + LFH_heap *heap = LFH_thread_heap(TRUE); + size_t class_size = LFH_get_class_size(flags, size); + + if (!LFH_deallocate_deferred_blocks(heap)) + return NULL; + + if (class_size == ~(size_t)0) + return NULL; + + if ((class = LFH_heap_get_class(heap, class_size))) + { + arena = LFH_acquire_arena(heap, class); + if (arena) block = LFH_allocate_block(heap, class, arena); + if (block) LFH_block_initialize(block, flags, 0, size, LFH_block_get_class_size(block)); + } + else + { + arena = LFH_allocate_huge_arena(heap, class_size); + if (arena) block = LFH_arena_get_block(arena, ARENA_HEADER_SIZE); + if (block) LFH_block_initialize(block, flags, 0, size, LFH_block_get_class_size(block)); + } + + if (!block) return NULL; + return LFH_ptr_from_block(block); +} + +static BOOLEAN LFH_free(ULONG flags, LFH_ptr *ptr) +{ + LFH_block *block = LFH_block_from_ptr(ptr); + LFH_arena *arena = LFH_arena_from_block(block); + LFH_heap *heap = LFH_heap_from_arena(arena); + + if (!LFH_class_from_arena(arena)) + return LFH_memory_deallocate(arena, LFH_block_get_class_size(block)); + + if (flags & HEAP_FREE_CHECKING_ENABLED) + { + unsigned int *data = (unsigned int *)LFH_ptr_from_block(block); + size_t class_size = LFH_block_get_class_size(block); + for (size_t i = 0; i < class_size / 4 - (data - (const unsigned int *)block); ++i) + data[i] = 0xfeeefeee; + } + + block->type = LFH_block_type_free; + LFH_slist_push(&heap->list_defer, &block->entry_defer); + + return TRUE; +} + +static LFH_ptr *LFH_reallocate(ULONG flags, LFH_ptr *old_ptr, size_t new_size) +{ + LFH_block *block = LFH_block_from_ptr(old_ptr); + size_t old_size = LFH_block_get_alloc_size(block, flags); + size_t old_class_size = LFH_block_get_class_size(block); + size_t new_class_size = LFH_get_class_size(flags, new_size); + LFH_ptr *new_ptr = NULL; + + if (new_class_size == ~(size_t)0) + return NULL; + + if (new_class_size <= old_class_size) + { + LFH_block_initialize(block, flags, old_size, new_size, old_class_size); + return old_ptr; + } + + if (flags & HEAP_REALLOC_IN_PLACE_ONLY) + return NULL; + + if (!(new_ptr = LFH_allocate(flags, new_size))) + return NULL; + + memcpy(new_ptr, old_ptr, old_size); + + if (LFH_free(flags, old_ptr)) + return new_ptr; + + LFH_free(flags, new_ptr); + return NULL; +} + +static size_t LFH_get_allocated_size(ULONG flags, const LFH_ptr *ptr) +{ + const LFH_block *block = LFH_block_from_ptr(ptr); + return LFH_block_get_alloc_size(block, flags); +} + +static BOOLEAN LFH_validate(ULONG flags, const LFH_ptr *ptr) +{ + const LFH_block *block = LFH_block_from_ptr(ptr); + const LFH_heap *heap; + + /* clear HEAP_VALIDATE so we only validate block */ + if (ptr) return LFH_validate_used_block(flags & ~HEAP_VALIDATE, block); + + if (!(heap = LFH_thread_heap(FALSE))) + return TRUE; + + return LFH_validate_heap(flags, heap); +} + +static BOOLEAN LFH_try_validate_all(ULONG flags) +{ + if (!(flags & HEAP_VALIDATE_ALL)) + return TRUE; + + if (LFH_validate(flags, NULL)) + return TRUE; + + LFH_dump_heap(LFH_thread_heap(FALSE)); + return FALSE; +}
void *HEAP_lfh_allocate(struct tagHEAP *std_heap, ULONG flags, SIZE_T size) { + void *ptr; + + TRACE("%p %08x %lx\n", std_heap, flags, size); + + if (!LFH_try_validate_all(flags)) + goto error; + + if ((ptr = LFH_allocate(flags, size))) + return ptr; + + if (flags & HEAP_GENERATE_EXCEPTIONS) RtlRaiseStatus(STATUS_NO_MEMORY); + RtlSetLastWin32ErrorAndNtStatusFromNtStatus(STATUS_NO_MEMORY); + return NULL; + +error: + RtlSetLastWin32ErrorAndNtStatusFromNtStatus(STATUS_INVALID_PARAMETER); return NULL; }
BOOLEAN HEAP_lfh_free(struct tagHEAP *std_heap, ULONG flags, void *ptr) { + TRACE("%p %08x %p\n", std_heap, flags, ptr); + + if (!LFH_try_validate_all(flags)) + goto error; + + if (!LFH_validate(flags, ptr)) + goto error; + + return LFH_free(flags, ptr); + +error: + RtlSetLastWin32ErrorAndNtStatusFromNtStatus(STATUS_INVALID_PARAMETER); return FALSE; }
void *HEAP_lfh_reallocate(struct tagHEAP *std_heap, ULONG flags, void *ptr, SIZE_T size) { + TRACE("%p %08x %p %lx\n", std_heap, flags, ptr, size); + + if (!LFH_try_validate_all(flags)) + goto error; + + if (!LFH_validate(flags, ptr)) + goto error; + + if ((ptr = LFH_reallocate(flags, ptr, size))) + return ptr; + + if (flags & HEAP_GENERATE_EXCEPTIONS) RtlRaiseStatus(STATUS_NO_MEMORY); + RtlSetLastWin32ErrorAndNtStatusFromNtStatus(STATUS_NO_MEMORY); + return NULL; + +error: + RtlSetLastWin32ErrorAndNtStatusFromNtStatus(STATUS_INVALID_PARAMETER); return NULL; }
SIZE_T HEAP_lfh_get_allocated_size(struct tagHEAP *std_heap, ULONG flags, const void *ptr) { + TRACE("%p %08x %p\n", std_heap, flags, ptr); + + if (!LFH_try_validate_all(flags)) + goto error; + + if (!LFH_validate(flags, ptr)) + goto error; + + return LFH_get_allocated_size(flags, ptr); + +error: + RtlSetLastWin32ErrorAndNtStatusFromNtStatus(STATUS_INVALID_PARAMETER); return ~(SIZE_T)0; }
BOOLEAN HEAP_lfh_validate(struct tagHEAP *std_heap, ULONG flags, const void *ptr) { - if (ptr) return FALSE; - else return TRUE; + TRACE("%p %08x %p\n", std_heap, flags, ptr); + + if (!LFH_try_validate_all(flags)) + return FALSE; + + return LFH_validate(flags, ptr); }
void HEAP_lfh_notify_thread_destroy(BOOLEAN last) { + SLIST_HEADER *list_orphan = LFH_orphan_list(); + SLIST_ENTRY *entry_orphan = NULL; + LFH_heap *heap; + + if (last) + { + while ((entry_orphan || (entry_orphan = RtlInterlockedFlushSList(list_orphan)))) + { + LFH_heap *orphan = LIST_ENTRY(entry_orphan, LFH_heap, entry_orphan); + entry_orphan = entry_orphan->Next; + LFH_heap_deallocate(orphan); + } + } + else if ((heap = LFH_thread_heap(FALSE)) && LFH_validate_heap(0, heap)) + RtlInterlockedPushEntrySList(list_orphan, &heap->entry_orphan); }
void HEAP_lfh_set_debug_flags(ULONG flags) { + LFH_heap *heap = LFH_thread_heap(FALSE); + if (!heap) return; + + LFH_deallocate_deferred_blocks(heap); } diff --git a/dlls/ntdll/ntdll_misc.h b/dlls/ntdll/ntdll_misc.h index 4dd642cd7842..3391950d5b42 100644 --- a/dlls/ntdll/ntdll_misc.h +++ b/dlls/ntdll/ntdll_misc.h @@ -267,6 +267,7 @@ struct ntdll_thread_data int wait_fd[2]; /* fd for sleeping server requests */ BOOL wow64_redir; /* Wow64 filesystem redirection flag */ pthread_t pthread_id; /* pthread thread id */ + void *heap; /* thread local heap data */ };
C_ASSERT( sizeof(struct ntdll_thread_data) <= sizeof(((TEB *)0)->GdiTebBatch) );
Hi,
While running your changed tests, I think I found new failures. Being a bot and all I'm not very good at pattern recognition, so I might be wrong, but could you please double-check?
Full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=71294
Your paranoid android.
=== debiant (32 bit report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (32 bit Chinese:China report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (32 bit WoW report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (64 bit WoW report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
--- dlls/ntdll/loader.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/dlls/ntdll/loader.c b/dlls/ntdll/loader.c index a5b86f289f47..d9ede19e4ea2 100644 --- a/dlls/ntdll/loader.c +++ b/dlls/ntdll/loader.c @@ -4485,6 +4485,7 @@ void __wine_process_init(void) INITIAL_TEB stack; BOOL suspend; SIZE_T info_size; + ULONG hci = HEAP_LFH; TEB *teb = thread_init(); PEB *peb = teb->Peb;
@@ -4494,6 +4495,7 @@ void __wine_process_init(void)
peb->ProcessHeap = RtlCreateHeap( HEAP_GROWABLE, NULL, 0, 0, NULL, NULL ); peb->LoaderLock = &loader_section; + RtlSetHeapInformation( peb->ProcessHeap, HeapCompatibilityInformation, &hci, sizeof(hci) );
init_unix_codepage(); init_directories();
Hi,
While running your changed tests, I think I found new failures. Being a bot and all I'm not very good at pattern recognition, so I might be wrong, but could you please double-check?
Full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=71295
Your paranoid android.
=== debiant (32 bit report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (32 bit Chinese:China report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (32 bit WoW report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (64 bit WoW report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
--- dlls/msvcrt/heap.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/dlls/msvcrt/heap.c b/dlls/msvcrt/heap.c index 31153fa5b183..ffe34726851d 100644 --- a/dlls/msvcrt/heap.c +++ b/dlls/msvcrt/heap.c @@ -530,7 +530,9 @@ int CDECL _set_sbh_threshold(MSVCRT_size_t threshold)
if(!sb_heap) { + ULONG hci = 2; sb_heap = HeapCreate(0, 0, 0); + HeapSetInformation(sb_heap, HeapCompatibilityInformation, &hci, sizeof(hci)); if(!sb_heap) return 0; } @@ -867,7 +869,9 @@ int CDECL MSVCRT_strncpy_s(char *dest, MSVCRT_size_t numberOfElements,
BOOL msvcrt_init_heap(void) { + ULONG hci = 2; heap = HeapCreate(0, 0, 0); + HeapSetInformation(heap, HeapCompatibilityInformation, &hci, sizeof(hci)); return heap != NULL; }
Hi,
While running your changed tests, I think I found new failures. Being a bot and all I'm not very good at pattern recognition, so I might be wrong, but could you please double-check?
Full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=71296
Your paranoid android.
=== debiant (32 bit report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (32 bit Chinese:China report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (32 bit WoW report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
=== debiant (64 bit WoW report) ===
kernel32: heap.c:659: Test failed: HeapSetInformation succeeded heap.c:661: Test failed: expected ERROR_GEN_FAILURE, got 3735928559
Rémi Bernon rbernon@codeweavers.com wrote:
This is a heap implementation based on thread-local structures, that I have been keeping locally for quite some time. The goal was to improve Wine's heap performance in multithreaded scenarios and see if it could help performance in some games.
The good news is that this implementation is performing well, according to third-party heap micro benchmarks. The bad news is that it doesn't change performance much in general, as allocations are usually scarse during gameplay. I could still see improvements for loading times, and less stalling as well.
Have you looked at the Sebastian's heap improvements patches in the staging tree? According to Sebastian's and Michael's testing "The new heap allocator uses (inspired by the way how it works on Windows) various fixed-size free lists, and a tree data structure for large elements. With this implementation, I get up to [b]60%[/b] improvement for apps with the "bad allocation pattern", and up to [b]15%[/b] improvement in the "good case"."
On 5/6/20 2:38 PM, Dmitry Timoshkov wrote:
Rémi Bernon rbernon@codeweavers.com wrote:
This is a heap implementation based on thread-local structures, that I have been keeping locally for quite some time. The goal was to improve Wine's heap performance in multithreaded scenarios and see if it could help performance in some games.
The good news is that this implementation is performing well, according to third-party heap micro benchmarks. The bad news is that it doesn't change performance much in general, as allocations are usually scarse during gameplay. I could still see improvements for loading times, and less stalling as well.
Have you looked at the Sebastian's heap improvements patches in the staging tree? According to Sebastian's and Michael's testing "The new heap allocator uses (inspired by the way how it works on Windows) various fixed-size free lists, and a tree data structure for large elements. With this implementation, I get up to [b]60%[/b] improvement for apps with the "bad allocation pattern", and up to [b]15%[/b] improvement in the "good case"."
I believe these patches are also shipped in Proton, and although it's performing better than the upstream heap there's still a lot of contention when multiple threads try to (de)allocate at the same time.
For reference I used https://github.com/mjansson/rpmalloc-benchmark as raw performance measurement. They start a given number of threads, with each thread doing a fixed number of iterations. Every iteration the thread allocates and frees a certain amount of memory, eventually with cross-thread allocation every other iteration, then does a given number of computation using the allocated buffers as storage. Then it measures the time it took to do all these operations.
For instance, with these benchmark parameters as indicated on their sample result page[1]:
<num threads> 0 0 2 20000 50000 5000 16 1000
I have the following results with the various implementations and using two concurrent threads (the higher the number of threads, the worse it gets, especially for the default Wine heap):
* linux crt: 5675754 memory ops/CPU second, 53% overhead * wine rpmalloc: 19700003 memory ops/CPU second, 131% overhead * wine upstream: 248333 memory ops/CPU second, 62% overhead * wine staging: 914004 memory ops/CPU second, 61% overhead * wine lfh: 10651300 memory ops/CPU second, 114% overhead
(linux crt is for running the benchmark natively, wine rpmalloc is their rpmalloc benchmark cross compiled and executed with wine, wine lfh is this patch series)
So the staging patches perform much better than the default heap, but it's still pretty dramatic when multiple threads are involved.
My opinion is that the heap critical section being hold on every (de)allocation it is going to be very hard to have see big improvements, regardless of the inner allocation algorithm.
[1] https://github.com/mjansson/rpmalloc-benchmark/blob/master/BENCHMARKS.md#ran...
2 0 0 2 20000 50000 5000 16 1000
Rémi Bernon rbernon@codeweavers.com wrote:
This is a heap implementation based on thread-local structures, that I have been keeping locally for quite some time. The goal was to improve Wine's heap performance in multithreaded scenarios and see if it could help performance in some games.
The good news is that this implementation is performing well, according to third-party heap micro benchmarks. The bad news is that it doesn't change performance much in general, as allocations are usually scarse during gameplay. I could still see improvements for loading times, and less stalling as well.
Have you looked at the Sebastian's heap improvements patches in the staging tree? According to Sebastian's and Michael's testing "The new heap allocator uses (inspired by the way how it works on Windows) various fixed-size free lists, and a tree data structure for large elements. With this implementation, I get up to [b]60%[/b] improvement for apps with the "bad allocation pattern", and up to [b]15%[/b] improvement in the "good case"."
I believe these patches are also shipped in Proton, and although it's performing better than the upstream heap there's still a lot of contention when multiple threads try to (de)allocate at the same time.
For reference I used https://github.com/mjansson/rpmalloc-benchmark as raw performance measurement. They start a given number of threads, with each thread doing a fixed number of iterations. Every iteration the thread allocates and frees a certain amount of memory, eventually with cross-thread allocation every other iteration, then does a given number of computation using the allocated buffers as storage. Then it measures the time it took to do all these operations.
For instance, with these benchmark parameters as indicated on their sample result page[1]:
<num threads> 0 0 2 20000 50000 5000 16 1000
I have the following results with the various implementations and using two concurrent threads (the higher the number of threads, the worse it gets, especially for the default Wine heap):
- linux crt: 5675754 memory ops/CPU second, 53% overhead
- wine rpmalloc: 19700003 memory ops/CPU second, 131% overhead
- wine upstream: 248333 memory ops/CPU second, 62% overhead
- wine staging: 914004 memory ops/CPU second, 61% overhead
- wine lfh: 10651300 memory ops/CPU second, 114% overhead
Do you have the numbers for various Windows flavours on the same hardware?
On 5/6/20 5:32 PM, Dmitry Timoshkov wrote:
Rémi Bernon rbernon@codeweavers.com wrote:
This is a heap implementation based on thread-local structures, that I have been keeping locally for quite some time. The goal was to improve Wine's heap performance in multithreaded scenarios and see if it could help performance in some games.
The good news is that this implementation is performing well, according to third-party heap micro benchmarks. The bad news is that it doesn't change performance much in general, as allocations are usually scarse during gameplay. I could still see improvements for loading times, and less stalling as well.
Have you looked at the Sebastian's heap improvements patches in the staging tree? According to Sebastian's and Michael's testing "The new heap allocator uses (inspired by the way how it works on Windows) various fixed-size free lists, and a tree data structure for large elements. With this implementation, I get up to [b]60%[/b] improvement for apps with the "bad allocation pattern", and up to [b]15%[/b] improvement in the "good case"."
I believe these patches are also shipped in Proton, and although it's performing better than the upstream heap there's still a lot of contention when multiple threads try to (de)allocate at the same time.
For reference I used https://github.com/mjansson/rpmalloc-benchmark as raw performance measurement. They start a given number of threads, with each thread doing a fixed number of iterations. Every iteration the thread allocates and frees a certain amount of memory, eventually with cross-thread allocation every other iteration, then does a given number of computation using the allocated buffers as storage. Then it measures the time it took to do all these operations.
For instance, with these benchmark parameters as indicated on their sample result page[1]:
<num threads> 0 0 2 20000 50000 5000 16 1000
I have the following results with the various implementations and using two concurrent threads (the higher the number of threads, the worse it gets, especially for the default Wine heap):
- linux crt: 5675754 memory ops/CPU second, 53% overhead
- wine rpmalloc: 19700003 memory ops/CPU second, 131% overhead
- wine upstream: 248333 memory ops/CPU second, 62% overhead
- wine staging: 914004 memory ops/CPU second, 61% overhead
- wine lfh: 10651300 memory ops/CPU second, 114% overhead
Do you have the numbers for various Windows flavours on the same hardware?
I only have Windows 10 physically installed. The results for the same set of parameters are roughly equivalent to these patches:
11977625 memory ops/CPU second, 106% overhead