On 5/6/20 5:32 PM, Dmitry Timoshkov wrote:
Rémi Bernon rbernon@codeweavers.com wrote:
This is a heap implementation based on thread-local structures, that I have been keeping locally for quite some time. The goal was to improve Wine's heap performance in multithreaded scenarios and see if it could help performance in some games.
The good news is that this implementation is performing well, according to third-party heap micro benchmarks. The bad news is that it doesn't change performance much in general, as allocations are usually scarse during gameplay. I could still see improvements for loading times, and less stalling as well.
Have you looked at the Sebastian's heap improvements patches in the staging tree? According to Sebastian's and Michael's testing "The new heap allocator uses (inspired by the way how it works on Windows) various fixed-size free lists, and a tree data structure for large elements. With this implementation, I get up to [b]60%[/b] improvement for apps with the "bad allocation pattern", and up to [b]15%[/b] improvement in the "good case"."
I believe these patches are also shipped in Proton, and although it's performing better than the upstream heap there's still a lot of contention when multiple threads try to (de)allocate at the same time.
For reference I used https://github.com/mjansson/rpmalloc-benchmark as raw performance measurement. They start a given number of threads, with each thread doing a fixed number of iterations. Every iteration the thread allocates and frees a certain amount of memory, eventually with cross-thread allocation every other iteration, then does a given number of computation using the allocated buffers as storage. Then it measures the time it took to do all these operations.
For instance, with these benchmark parameters as indicated on their sample result page[1]:
<num threads> 0 0 2 20000 50000 5000 16 1000
I have the following results with the various implementations and using two concurrent threads (the higher the number of threads, the worse it gets, especially for the default Wine heap):
- linux crt: 5675754 memory ops/CPU second, 53% overhead
- wine rpmalloc: 19700003 memory ops/CPU second, 131% overhead
- wine upstream: 248333 memory ops/CPU second, 62% overhead
- wine staging: 914004 memory ops/CPU second, 61% overhead
- wine lfh: 10651300 memory ops/CPU second, 114% overhead
Do you have the numbers for various Windows flavours on the same hardware?
I only have Windows 10 physically installed. The results for the same set of parameters are roughly equivalent to these patches:
11977625 memory ops/CPU second, 106% overhead