Since I was also asking for some suggestions about what the preferred fix would be, Sebastian Lackner pointed to this mailing list.
The bug report contains the source code for a program that has an allocation pattern that is very bad (for the current wine allocator).
This appears to be a known problem, but I was unable to find any test/benchmark that shows such degenerate behavior.
So far none of the attempts at fixing it seem to have landed. Some possible approaches are:
1. ensure that the freelist sizes cover all (alignment rounded) sizes up to a certain value;
2. when looking up for a free block, start from the freelist whose smallest element is at least as big as the desired block;
3. implement more complex free block management.
#1 would fix the issue for some allocations sizes, but for bigger allocations the problem would stay the same.
#2 would fix the performance problem at the cost of increased memory usage: for some sizes it would be possible that after a block is allocated and freed, the following allocation for that size does not re-use that memory (if the free block is stored in a freelist together with smaller blocks).
#3 would increase the complexity of the code; to keep it under control, it might be possible to re-use the red-black tree implementation already available in wine. To avoid regressing performance, the smallest blocks might still be handled with simple freelists (fixed as in #1), while larger blocks are managed in a tree.
Some of the current/previous attempts at tackling this issue are:
-
https://github.com/wine-compholio/wine-staging/blob/master/patches/ntdll-Heap_FreeLists/0001-ntdll-Improve-heap-allocation-performance-by-using-m.patch (implements #1)
-
https://github.com/laino/wine-patches (IIUC, a combination of #1 and #3, in which the code tries to keep the freelist length under control, but AFAICT there is no guaranteed bound)
I would like to try and work on fixing this, but there are some decision to make that might depend on constraints I am not aware of.
Would the solution I propose (#3, rbtree based, with some tuning for small allocation sizes) be acceptable?
Are there any benchmarks that a new implementation should strive to improve or at least avoid regressing?
Suggestions and alternative solutions are welcome :)
Sorry for the long message, thank you for your patience
Andrea