Hi all!
This is a heap implementation based on thread-local structures, that I have been keeping locally for quite some time. The goal was to improve Wine's heap performance in multithreaded scenarios and see if it could help performance in some games.
The good news is that this implementation is performing well, according to third-party heap micro benchmarks. The bad news is that it doesn't change performance much in general, as allocations are usually scarse during gameplay. I could still see improvements for loading times, and less stalling as well.
As I've been tweaking it lately and now that it passes the heap tests I believe it could still be somehow useful for others, or interesting for anyone eager to try. It could also possibly be used for comparison, to determine if the heap is the bottleneck for some use case.
The implementation details are in patch #9 for anyone interested. The thread-local heaps are enabled by HeapSetInformation calls, when the call requests LFH heap, but I also added a WINELFH environment variable to guard it so that it can also be disabled globally.
There are multiple issues that I am aware of, but for the applications that I tried they didn't seem to cause much trouble:
* Block headers are different from the default ones, but then FWIW the default block headers don't seem to match with recent Windows either, which also vary depending on the Windows version.
* HeapWalk will not show the blocks allocated from the thread-local heaps. It could be possible to also walk the current thread-local heap but then because of implementation detail, some blocks would stlil not show up (fully used arenas are detached from their heap until a block is freed).
* HeapDestroy doesn't automatically free the blocks that were allocated from the thread-local heaps and they can possibly be lost forever for the same reason as above. Although it could be possible to keep track of fully used arenas, the same thread-local heaps are shared between heaps, so there's currently no way to tell which block should be freed on destroy.
* The freed blocks are deferred to the thread that allocated them, but in general they are quickly reused. There's some validation being done but it could still cause some issues in case of use after free. It should be easily fixed by adding a buffer, but as I didn't see any game needing it, didn't seem to worth the trouble for now.
Cheers,
Rémi Bernon (11): kernel32: Catch page faults in GlobalSize. kernel32/tests: Add HeapSetInformation and LFH tests. ntdll: Split standard heap functions. ntdll: Add thread destroy notification function. ntdll: Add extended heap type and LFH stubs. ntdll: Implement RtlSetHeapInformation for LFH. ntdll: Move undocumented flags to ntdll_misc.h. HACK: ntdll: Conditionally enable LFH. ntdll: Implement Low Fragmentation Heap. ntdll: Enable LFH for process heap. msvcrt: Enable LFH for internal heaps.
dlls/kernel32/heap.c | 78 +-- dlls/kernel32/tests/heap.c | 66 +++ dlls/msvcrt/heap.c | 4 + dlls/ntdll/Makefile.in | 1 + dlls/ntdll/heap.c | 220 ++++++-- dlls/ntdll/heap_lfh.c | 1057 ++++++++++++++++++++++++++++++++++++ dlls/ntdll/loader.c | 3 + dlls/ntdll/ntdll_misc.h | 28 + dlls/ntdll/thread.c | 1 + 9 files changed, 1379 insertions(+), 79 deletions(-) create mode 100644 dlls/ntdll/heap_lfh.c