https://bugs.winehq.org/show_bug.cgi?id=43224
Bug ID: 43224 Summary: Freelist scan can result in O(n) time when allocating Product: Wine Version: unspecified Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: ntdll Assignee: wine-bugs@winehq.org Reporter: ranma42@gmail.com Distribution: ---
Created attachment 58517 --> https://bugs.winehq.org/attachment.cgi?id=58517 Micro-benchmark that shows the degenerate behaviour
The heap implementation in ntdll can be very inefficient for some allocation patterns. Specifically, it is possible that a freelist contains thousands of entries of a given size and it is scanned while looking for a bigger entry, that will not fit in any of those.
This is demonstrated by the attached program. It accepts a size as an argument and will allocate 1000000 elements of that size, deallocate half of them (the "even" ones), then allocate 1000 elements that are just one bit bigger.
For most sizes the program terminates in milliseconds, but in my environment it takes several seconds for sizes 16, 32, 48, 64, 80, 88, 96, 112, 120, 128.
A possible workaround (that I had implemented in the past, for short-lived applications) is to allocate from a freelist whose entries are all at least as big as the block that is being allocated, but that can cause inefficient memory usage.
I noticed that wine includes an implementation of a red-black tree. Would it make sense to organise the free entries as lists in such a tree? That should ensure O(log N) in the worst case. If performance is a concern, it might be possible to use "direct" freelists for small sizes (ensuring that all of the alignment-compatible sizes are covered) and use a tree for bigger sizes.
I am willing to work on this, but I need some directions on what is the best path forward (namely, if we should prefer the memory-inefficient but simpler approach of just allocating from "bigger" freelists or if the tree approach is the desired one).
The bug https://bugs.winehq.org/show_bug.cgi?id=37773 might be related.