On 4/24/22 21:18, Derek Lesho wrote:
Hi All,
In the wake of the new WOW64 implementation (recent explanation [1]), there has been discussion in informal channels about how to we are going to handle pointers to mapped graphics resource memory which we receive from the graphics API, as the possibility exists that it will fall outside of the 32-bit address space.
Over time, a few creative solutions have been proposed and discussed, with a common theme being that we need changes in either the kernel or the graphics drivers to do this properly. As we already know the requirements for a solution to this problem, I think it would be responsible to hash this out now and then work with the relevant project maintainers earlier as to avoid blocking work on the wine side too long and to possibly allow more users to test the new path earlier.
Thank you for starting this conversation! I agree with all of these points. WoW64 emulation is still a long way off, if it'll even happen by default on platforms other than Mac, but nevertheless this is something we should look into supporting sooner than later.
It would probably be good to start a dri-devel/mesa-dev thread to discuss this as well.
The solutions which I've seen laid out so far:
- Use the mremap(2) interface, allowing us to duplicate the mapping we
receive into the 32-bit address space. This solution would match what is already done for Crossover Mac's 32on64 support using Mac's mach_vm_remap functionality [2]. However, right now it is not possible to use the MREMAP_DONTUNMAP flag with mappings that aren't private and anonymous, which rules out there use on mapped FDs from libdrm. Due to this, a kernel change would be necessary.
Pro: A uniform solution across all APIs, which could help in the future with any unforeseen need to access host-allocated memory in 32-bit windows code.
Cons: Requires a kernel change, which of all the options may take the longest to get up-streamed and in the hands of users.
Frankly, I think it may be worth looking into this even if we do try to implement another solution for GPU mappings specifically. As you say, it may potentially come in useful in other places.
In fact, in general I think looking into multiple solutions, and being able to fall back from one to another, is not necessarily a bad idea.
Also: it may be worth looking into kernel extensions other than mremap(2). We already have to deal with the problem of reserving the low 2 GB for Win32 memory, and our current solutions to that can cause problems (I was recently bitten by this, in bug 52840 [1]).
A personality switch or pair of switches like "map everything under 2/4 GB" and "prefer mapping above 2/4 GB" would be helpful, so that we can force mapping under 2 GB in NtAllocateVirtualMemory() and GPU mappings and above 2 GB otherwise. Unlike extending mremap(2), these would be useful for normal allocations as well, i.e. they'd allow us to do a better job of placing system libraries where we want them.
See also below s.v. ADDR_LIMIT_32BIT.
[1] https://bugs.winehq.org/show_bug.cgi?id=52840
- Work with Khronos to introduce extensions into the relevant APIs
enabling us to tell drivers where in the address space we want resources mapped.
Pro: Wouldn't require going around the backs of the driver, resulting in a more hardened solution. (Out there, but what if a creative driver returns a mapping without read or write permission and handles accesses through a page fault handler?)
Cons: The extension would have to be implemented by each individual vendor for every relevant API. This would implicitly drop support for those with cards whose graphics drivers are no longer being updated.
- Hook the driver's mmap call when we invoke memory mappings function,
overriding the address to something in the 32-bit address space.
Pro: Unlike the other solutions, this wouldn't require any changes to other projects, and shares the advantage of the first solution.
Cons: Susceptible to breakage if the driver uses their own mapping mechanism separate from mmap. (Custom IOCTL, CPU driver returning something from the heap)
Here's a few other ideas / considerations I think are worth mentioning:
- Reserve the entire address space above 2G (or 3G with the appropriate image flags). This is essentially what we already do for 32-bit programs. I'm not sure if reserving 2**48 bytes of memory will run into problems, though? Has this been tried?
- Linux has a personality(2) switch ADDR_LIMIT_32BIT. The documentation is terse, so I'm not fully sure what this does, but it might be sufficient to ensure that new mappings are placed under 2 GB, while not breaking old mappings? And presumably it's also toggleable. It's not ideal exactly—we'd like to be able to set a 3 GB or 4 GB limit instead if the binary allows—but it's potentially already usable.
- We can emulate mappings for everything except coherent memory by manually implementing mapping functions with a separate sysmem location. We can implement persistent mappings this way, too, by copying on a flush, but unfortunately we can't expose GL_ARB_buffer_storage without coherent mappings.
[Fortunately d3d doesn't require coherent memory or ARB_buffer_storage, and the Vulkan backend doesn't require coherent memory for map acceleration. The GL backend currently does, but could be made not to. We'd have to add a private extension to use ARB_buffer_storage while not actually marking any maps as coherent. Of course, d3d isn't the only user of GL or Vulkan, and unfortunately ARB_buffer_storage is core in 4.3, so I'm sure there are GL applications out there that rely on it...]
I think we can actually emulate coherent memory as well, by tracking resource bindings and manually flushing on draws. That's a little painful, though.
- Crazy idea: On Linux, parse /proc/self/maps to allow remapping non-anonymous pages. Combined with mremap(2) or manual emulation, this allows mapping everything except for shared anonymous pages [and I can't imagine that a GPU driver would use those, especially given that the only way to make use of the SHARED flag is fork(2)].
ἔρρωσθε, Zeb