On 8/6/20 18:42, RĂ©mi Bernon wrote:
I can understand what this is doing (extend the free ranges tracking over the whole address space, and merge all the code paths together), but it's a big change all at once.
Yes, this is the case, in a tiny bit more details the logic is:
1. iterate over free areas (skip too small right away);
2. within free area, enumerate reserved areas and allocate the memory during enumeration or right after if there is a space at the edges of enumeration;
In the majority of cases step 2 should succeed from the first time.
I thought of doing that in parts (by leaving reserved areas as is), but it was getting more complicated and ugly as it is now in this version. We would need to maintain a separate free area list for the free areas outside of reserved areas, which was getting a bit tricky for some corner cases, and looked very weird overall. If to keep one list, we would need to prevent the free list logic from joining free areas between reserved and "normal" space. That would move the complications there and result in the longer overall code, while probably not making free list managing nicer and is not needed long term.
The free ranges were only tracked within the reserved areas mostly because it was only useful there, but also because the mmap smaller alignment was causing a lot of fragmentation in an initial implementation. Now that we align all views and ranges to 64k I don't think it would fragment that much anymore so it could probably be done separately. And I think it would still be interesting to gather some statistics to see how many ranges and holes we usually have, just to check that it's not going crazy.
I did observe that statistics over some games and I think even still have some log recorded which I used as the data source for the test case I made to reproduce some real life allocation cases (keeping that from separate threads) when testing the performance. I will need some time to gather that once again and come up with some verified figures, but from what I can tell at once:
- The number of views varies greatly between the games, from a few thousands to hitting the default Linux mmap limit, with values roughly about ~10000-2000 seen often;
- The number of free ranges is not great, I doubt I ever saw more than a hundred. With forced 64k alignment a lot of allocations do not produce a lot of free ranges. To make this number great the application should use a really weird pattern of allocation by doing explicit VM allocs of a small size and then freeing a lot in between, and then allocating bigger chunks so the existing free blocks do not fit.
About the rest, wouldn't it be possible to just reserve some more pages with reserve_area where we expect some free memory to be, and if we are out of reserved space, then use the existing code to allocate the views within the newly reserved space. Of course it would possibly cause more memory to be reserved for Wine and less to be available to system, I'm not sure if we can mitigate that somehow.
As a game specific hack, sure, this can work, but it looks a bit problematic for me as a general solution. First of all, I am unsure how to sensibly choose this parameter in a universal way. We need to reserve as much memory as the application is going to ever allocate. E. g., when I was testing this with AION 64 bit and it was OK with addresses withing ~16GB (apparently it was not planning to ever use more RAM), returning bigger pointers were crashing it. I guess this relates to every application affected, it just expects the pointer in the memory pointers to be in the certain range. The range may differ greatly (e. g., MK11 is fine with Windows 7 64 bit address space limit), but to guarantee that range with reserved areas we will have to always reserve as much RAM as application is using. Besides, there are pointers given to application which are obtained from native libraries, like Open GL / Vulkan mappings, audio buffers etc. Those pointers on Windows also fall into expected ranges. If we reserve low memory for Wine allocations native pointers will always fall outside. I am not aware if that will break any existing application, but it could.
Overall, do you think that maintaining that allocation "duality", within reserved areas and without, is making anything more straightforward, given we can have solution which avoids that and that is acceptable performance wise? I had some preliminary solution like that before your free lists for reserved areas were upstreamed, IMO (apart from my free lists having a very skeleton implementation) it looked much more cumbersome and was introducing a lot of code which should supposingly go away long term.
BTW do we need those reserved areas at all for anything besides ensuring core system libraries get to the right place and we do get some low memory for things like zero_mask allocations (while for the latter case this space can be taken away)? I suspect that not. Maybe once we switch to ordered allocations we can just remove reserved areas once those DLLs are loaded and thus simplify the alocations?