Hi everyone,
I'm creating this thread to start a discussion about how we want to complete GL support on Wow64, in the hopes that, if we want a MESA extension for this, we can get it to them sooner rather than later.
The core of the problem is same that we faced for Vulkan: namely that host visible mappings from graphics APIs may be located above 32-bit memory, in which case we can't pass them to the application as is. In the current implementation that Rémi wrote a few years ago, we use a temporary buffer for glMapBuffer, and manually have to copy the data from it back to the real mapping on glUnmapBuffer.
This solution works for most applications, but is slower than direct writes, and doesn't allow use of the GL_MAP_PERSISTENT_BIT glMapBuffer flag, where the application doesn't have to unmap the memory before usage by the API. Because of this problem, we can't support GL versions
= 4.4 in this path, which admittedly isn't a huge problem considering
wined3d now supports Vulkan, but does rule out some native GL apps/games from running on the new path. I'll go over a few potential solutions:
- "Just Use Zink": this idea has been floated for a while, and would be to use a PE build of Zink, the MESA gallium driver on top of Vulkan, which would then automatically make use of our VK_EXT_map_memory_placed integration in winevulkan and bypass the problem. Rémi has a branch with a draft solution for this [1] The advantage of approach is that it doesn't require any more extensions to any more APIs, but the disadvantage is that Wine would have to worry about keeping a separate version of Mesa up to date and support for building the required c++ components of mesa to its build system, as can be seen in the commits.
- GL extension with placed memory allocation callback: In this case, Wine provides a map and unmap callback for the GL implementation to use in creating the pages it needs for GPU-mappings. In comparison to the PE Zink solution, we can continue to use system libraries, and maintain fast buffer IO, as long as the glMapBuffer implementation returns the mapped ptr directly. The main downside for this solution is of course the introduction of a new extension to the mostly dormant GL API, but here it would be possible to just use a VK_EXT_map_memory leveraging Zink like in the first solution, only this time on the Unix side.
For this solution I've created a draft: Wine MR [2] Mesa Branch with Zink implementation [3]
- glMapBuffer extension which we send our placed allocation: The idea here is that slightly extended glMapBuffer could be sent a flag to use Wine mapping, avoiding callbacks and targeting the problem where it manifests for Wine (when we get an address out of 32-bit space). This was briefly discussed on the LGD discord server, but it won't work well, due to the fact that a GL buffer is usually only a suballocation of a memory mapping, and often has already been assigned to another pool of memory by the time glMapMemory is called. Mesa would have to add a considerable sized implementation setting up custom-allocator pools, and be able to move buffers between them to implement this.
- glBufferStorage extension to ease Mesa implementation: The next logical conclusion based on the problems of the last solution is that the Mesa implementation should get the information about any custom allocation at buffer creation time. The closest equivalent to this is the glBufferStorage entry, where we could create a new type of memory that we would ask the implementation to put the buffer in. For this solution we'd either have to implement custom memory allocator or introduce more API entries in order to allow the driver to convey to the application what size mappings it prefers. This would be very unwieldy, and wouldn't solve speed up the slow buffer copies for legacy buffers which don't use ARB_buffer_storage.
- finally, going off the last possibility, if we're willing to create our own buffer allocator (we probably shouldn't, as the driver probably knows best), we could use the already existing EXT_external_objects extensions to import Vulkan memory objects, from which our allocator would allocate slices for the buffers. In glMapBuffer we would then return the Vulkan placed memory mapping.
If I got anything wrong or overlooked anything, please let me know. If we decide on a solution that involves mesa, I would gladly then include the mesa-dev list to see if they have opinions for/against any of the proposals.
Thanks, Derek
1: https://gitlab.winehq.org/rbernon/wine/-/tree/wip/just-use-zink
2: https://gitlab.winehq.org/wine/wine/-/merge_requests/6663
3: https://gitlab.freedesktop.org/Guy1524/mesa/-/commits/placed_allocation
On Sun, 2024-10-13 at 17:16 +0200, Derek Lesho wrote:
- "Just Use Zink": this idea has been floated for a while, and would be
to use a PE build of Zink, the MESA gallium driver on top of Vulkan, which would then automatically make use of our VK_EXT_map_memory_placed integration in winevulkan and bypass the problem. Rémi has a branch with a draft solution for this [1] The advantage of approach is that it doesn't require any more extensions to any more APIs, but the disadvantage is that Wine would have to worry about keeping a separate version of Mesa up to date and support for building the required c++ components of mesa to its build system, as can be seen in the commits.
Do we have to include Mesa in Wine? If dropping in a PE build of Zink works then we could also consider treating it as a Wine add-on.
Am 14.10.2024 um 13:32 schrieb Hans Leidekker hans@codeweavers.com:
On Sun, 2024-10-13 at 17:16 +0200, Derek Lesho wrote:
- "Just Use Zink": this idea has been floated for a while, and would be
to use a PE build of Zink, the MESA gallium driver on top of Vulkan, which would then automatically make use of our VK_EXT_map_memory_placed integration in winevulkan and bypass the problem. Rémi has a branch with a draft solution for this [1] The advantage of approach is that it doesn't require any more extensions to any more APIs, but the disadvantage is that Wine would have to worry about keeping a separate version of Mesa up to date and support for building the required c++ components of mesa to its build system, as can be seen in the commits.
Do we have to include Mesa in Wine? If dropping in a PE build of Zink works then we could also consider treating it as a Wine add-on.
I looked into Zink for Mac OS use in the past and I am not a big fan of it. It didn’t work well (even on Linux) and when it worked it was slow. We shouldn’t go down this path, comfortable as it may be. The host GL knows the hardware better, can do things like thunk out of an emulator if need be and will work on systems where Vulkan is not available.
Fwiw as far as wined3d-gl is concerned, it can play nice with slow bounce buffers too. It should do the right thing if GL_ARB_buffer_storage is not available. d3d isn't as badly affected by the performance penalty, although there are games that profit from persistent maps.
The dosemu2 dev pointed out a way to achieve something similar to macos' mach_vm_remap on Linux. I have to find my email in the archive and will forward his suggestion. It does sound somewhat hacky to me, I am not sure if we want to use it.
On Sunday, 13 October 2024 10:16:47 CDT Derek Lesho wrote:
- "Just Use Zink": this idea has been floated for a while, and would be
to use a PE build of Zink, the MESA gallium driver on top of Vulkan, which would then automatically make use of our VK_EXT_map_memory_placed integration in winevulkan and bypass the problem. Rémi has a branch with a draft solution for this [1] The advantage of approach is that it doesn't require any more extensions to any more APIs, but the disadvantage is that Wine would have to worry about keeping a separate version of Mesa up to date and support for building the required c++ components of mesa to its build system, as can be seen in the commits.
I don't think Zink is a viable option.
First and foremost, the range of GPU hardware out there that should be reasonably supported is not all Vulkan-capable at this point. (Even Mesa still supports GPU hardware well below the Vulkan feature requirements).
Also, as Stefan mentioned, its stability and performance are well below what they should be in order to avoid functional regressions. While these may be solvable in the long term (although I'm a bit concerned about stability), I do think it means we can't rely on it. Distributions and corporate consumers alike are chomping at the bit to delete 32-bit support, and that means that we need to provide a smooth transition without any regressions.
- GL extension with placed memory allocation callback: In this case,
Wine provides a map and unmap callback for the GL implementation to use in creating the pages it needs for GPU-mappings. In comparison to the PE Zink solution, we can continue to use system libraries, and maintain fast buffer IO, as long as the glMapBuffer implementation returns the mapped ptr directly. The main downside for this solution is of course the introduction of a new extension to the mostly dormant GL API, but here it would be possible to just use a VK_EXT_map_memory leveraging Zink like in the first solution, only this time on the Unix side.
For this solution I've created a draft: Wine MR [2] Mesa Branch with Zink implementation [3]
If this is going to require explicit use of Zink on the Unix side, I don't think it's feasible either, unfortunately, for the same reasons.
- glMapBuffer extension which we send our placed allocation: The idea
here is that slightly extended glMapBuffer could be sent a flag to use Wine mapping, avoiding callbacks and targeting the problem where it manifests for Wine (when we get an address out of 32-bit space). This was briefly discussed on the LGD discord server, but it won't work well, due to the fact that a GL buffer is usually only a suballocation of a memory mapping, and often has already been assigned to another pool of memory by the time glMapMemory is called. Mesa would have to add a considerable sized implementation setting up custom-allocator pools, and be able to move buffers between them to implement this.
- glBufferStorage extension to ease Mesa implementation: The next
logical conclusion based on the problems of the last solution is that the Mesa implementation should get the information about any custom allocation at buffer creation time. The closest equivalent to this is the glBufferStorage entry, where we could create a new type of memory that we would ask the implementation to put the buffer in. For this solution we'd either have to implement custom memory allocator or introduce more API entries in order to allow the driver to convey to the application what size mappings it prefers. This would be very unwieldy, and wouldn't solve speed up the slow buffer copies for legacy buffers which don't use ARB_buffer_storage.
I don't think the "legacy buffers" part is a problem. We can explicitly call glBufferStorage() if the application doesn't. (IIRC it's legal to call it multiple times, so we can call it once on creation with the relevant flag, and then append the flag if it's called again later.)
As for suballocation, my best proposal is that instead of specifying an exact address, Wine would simply specify that the address needs to be below the 4GB boundary. I don't remember if there was a reason to allow a specific address on the Vulkan side, but we don't need it in Wine, and I don't know that a new GL extension needs to be quite as forward-looking at this point.
- finally, going off the last possibility, if we're willing to create
our own buffer allocator (we probably shouldn't, as the driver probably knows best), we could use the already existing EXT_external_objects extensions to import Vulkan memory objects, from which our allocator would allocate slices for the buffers. In glMapBuffer we would then return the Vulkan placed memory mapping.
This is infeasible due to the Vulkan requirement, and would be quite a lot of work. Moreover, though, there is at least one important case where suballocated buffers are not as versatile as normal ones: you cannot use an index buffer with an offset in an indirect draw. [Incidentally, this is probably the biggest thing hampering D3D10+ performance in the GL renderer.]
Am 14.10.24 um 12:32 schrieb Hans Leidekker:
On Sun, 2024-10-13 at 17:16 +0200, Derek Lesho wrote:
- "Just Use Zink": this idea has been floated for a while, and would be
to use a PE build of Zink, the MESA gallium driver on top of Vulkan, which would then automatically make use of our VK_EXT_map_memory_placed integration in winevulkan and bypass the problem. Rémi has a branch with a draft solution for this [1] The advantage of approach is that it doesn't require any more extensions to any more APIs, but the disadvantage is that Wine would have to worry about keeping a separate version of Mesa up to date and support for building the required c++ components of mesa to its build system, as can be seen in the commits.
Do we have to include Mesa in Wine? If dropping in a PE build of Zink works then we could also consider treating it as a Wine add-on.
Yeah I don't see any reason this wouldn't work, and from everything I know would probably be the better solution if we go the Zink route, it just leaves open the question how we distribute it, maybe something like wine-mono? Also, maybe Rémi knows something I don't that caused him to write the integrated version.
Am 14.10.24 um 13:21 schrieb Stefan Dösinger:
I looked into Zink for Mac OS use in the past and I am not a big fan of it. It didn’t work well (even on Linux) and when it worked it was slow. We shouldn’t go down this path, comfortable as it may be. The host GL knows the hardware better, can do things like thunk out of an emulator if need be and will work on systems where Vulkan is not available.
Fwiw as far as wined3d-gl is concerned, it can play nice with slow bounce buffers too. It should do the right thing if GL_ARB_buffer_storage is not available. d3d isn't as badly affected by the performance penalty, although there are games that profit from persistent maps.
I think Aida mentioned this path was prohibitively slow for WineD3D, not sure which games they were referring to.
The dosemu2 dev pointed out a way to achieve something similar to macos' mach_vm_remap on Linux. I have to find my email in the archive and will forward his suggestion. It does sound somewhat hacky to me, I am not sure if we want to use it.
Oh interesting, yeah that be cool to see. Regardless, even if we have that path, we would probably want to also have a proper long term solution, like how we have the host external memory backup hack in winevulkan.
Am 14.10.24 um 20:46 schrieb Elizabeth Figura:
On Sunday, 13 October 2024 10:16:47 CDT Derek Lesho wrote:
- "Just Use Zink": this idea has been floated for a while, and would be
to use a PE build of Zink, the MESA gallium driver on top of Vulkan, which would then automatically make use of our VK_EXT_map_memory_placed integration in winevulkan and bypass the problem. Rémi has a branch with a draft solution for this [1] The advantage of approach is that it doesn't require any more extensions to any more APIs, but the disadvantage is that Wine would have to worry about keeping a separate version of Mesa up to date and support for building the required c++ components of mesa to its build system, as can be seen in the commits.
I don't think Zink is a viable option.
First and foremost, the range of GPU hardware out there that should be reasonably supported is not all Vulkan-capable at this point. (Even Mesa still supports GPU hardware well below the Vulkan feature requirements).
Right, although if the alternative is a new GL extension, I do wonder how long it would take mesa to implement it for the drivers of said old hardware.
Also, as Stefan mentioned, its stability and performance are well below what they should be in order to avoid functional regressions. While these may be solvable in the long term (although I'm a bit concerned about stability), I do think it means we can't rely on it. Distributions and corporate consumers alike are chomping at the bit to delete 32-bit support, and that means that we need to provide a smooth transition without any regressions.
- GL extension with placed memory allocation callback: In this case,
Wine provides a map and unmap callback for the GL implementation to use in creating the pages it needs for GPU-mappings. In comparison to the PE Zink solution, we can continue to use system libraries, and maintain fast buffer IO, as long as the glMapBuffer implementation returns the mapped ptr directly. The main downside for this solution is of course the introduction of a new extension to the mostly dormant GL API, but here it would be possible to just use a VK_EXT_map_memory leveraging Zink like in the first solution, only this time on the Unix side.
For this solution I've created a draft: Wine MR [2] Mesa Branch with Zink implementation [3]
If this is going to require explicit use of Zink on the Unix side, I don't think it's feasible either, unfortunately, for the same reasons.
There's nothing about my draft that inherently restricts it to Zink fwiw, it's just one entry point that allows wine to allocate pages for allocations. I just implemented it in Zink first as a proof of concept.
- glMapBuffer extension which we send our placed allocation: The idea
here is that slightly extended glMapBuffer could be sent a flag to use Wine mapping, avoiding callbacks and targeting the problem where it manifests for Wine (when we get an address out of 32-bit space). This was briefly discussed on the LGD discord server, but it won't work well, due to the fact that a GL buffer is usually only a suballocation of a memory mapping, and often has already been assigned to another pool of memory by the time glMapMemory is called. Mesa would have to add a considerable sized implementation setting up custom-allocator pools, and be able to move buffers between them to implement this.
- glBufferStorage extension to ease Mesa implementation: The next
logical conclusion based on the problems of the last solution is that the Mesa implementation should get the information about any custom allocation at buffer creation time. The closest equivalent to this is the glBufferStorage entry, where we could create a new type of memory that we would ask the implementation to put the buffer in. For this solution we'd either have to implement custom memory allocator or introduce more API entries in order to allow the driver to convey to the application what size mappings it prefers. This would be very unwieldy, and wouldn't solve speed up the slow buffer copies for legacy buffers which don't use ARB_buffer_storage.
I don't think the "legacy buffers" part is a problem. We can explicitly call glBufferStorage() if the application doesn't. (IIRC it's legal to call it multiple times, so we can call it once on creation with the relevant flag, and then append the flag if it's called again later.)
As far as I understand it it's not legal to call glBufferData after glBufferStorage, but yeah we could maybe make an exception in the extension. FWIW I actually have an incomplete test branch implementing a path like this (just without the legacy buffer part), and my impression was, it's definitely feasible, but a lot more driver code than a basic "driver please use my allocator" solution.
As for suballocation, my best proposal is that instead of specifying an exact address, Wine would simply specify that the address needs to be below the 4GB boundary. I don't remember if there was a reason to allow a specific address on the Vulkan side, but we don't need it in Wine, and I don't know that a new GL extension needs to be quite as forward-looking at this point.
The problem is the way to fulfill this requirement on Linux is a bit problematic. MAP_32bit will always only allocated in the first 2GB of address space, and there's no way to change this in the case of LAA.
- finally, going off the last possibility, if we're willing to create
our own buffer allocator (we probably shouldn't, as the driver probably knows best), we could use the already existing EXT_external_objects extensions to import Vulkan memory objects, from which our allocator would allocate slices for the buffers. In glMapBuffer we would then return the Vulkan placed memory mapping.
This is infeasible due to the Vulkan requirement, and would be quite a lot of work. Moreover, though, there is at least one important case where suballocated buffers are not as versatile as normal ones: you cannot use an index buffer with an offset in an indirect draw. [Incidentally, this is probably the biggest thing hampering D3D10+ performance in the GL renderer.]
Ah, good to know, and yeah I agree this is probably not practical.
On 10/14/24 21:38, Derek Lesho wrote:
Am 14.10.24 um 12:32 schrieb Hans Leidekker:
On Sun, 2024-10-13 at 17:16 +0200, Derek Lesho wrote:
- "Just Use Zink": this idea has been floated for a while, and would be
to use a PE build of Zink, the MESA gallium driver on top of Vulkan, which would then automatically make use of our VK_EXT_map_memory_placed integration in winevulkan and bypass the problem. Rémi has a branch with a draft solution for this [1] The advantage of approach is that it doesn't require any more extensions to any more APIs, but the disadvantage is that Wine would have to worry about keeping a separate version of Mesa up to date and support for building the required c++ components of mesa to its build system, as can be seen in the commits.
Do we have to include Mesa in Wine? If dropping in a PE build of Zink works then we could also consider treating it as a Wine add-on.
Yeah I don't see any reason this wouldn't work, and from everything I know would probably be the better solution if we go the Zink route, it just leaves open the question how we distribute it, maybe something like wine-mono? Also, maybe Rémi knows something I don't that caused him to write the integrated version.
MESA WGL implementation is far from complete, and it lacks various things like wgl*Font functions that Wine has implemented already.
Then it's not clear whether it's possible to implement the missing bits with only the public gdi32/user32 exports, and embedding it into Wine would instead give us much more control over the WSI implementation.
Some other WGL details are also simply not possible to correctly implement using the host GL implementation [1] and would require access to the GL driver internals. Embedding would also solve this, as we can then plug into or tweak its WSI code to fit our needs.
[1] ie: wglShareLists which we have various heuristics and workarounds for to handle most but not all cases, wglCopyContext which has a GLX equivalent but cannot be implemented with EGL for Wayland, probably others I'm missing.
On Monday, 14 October 2024 14:38:51 CDT Derek Lesho wrote:
Am 14.10.24 um 13:21 schrieb Stefan Dösinger:
I looked into Zink for Mac OS use in the past and I am not a big fan of it. It didn’t work well (even on Linux) and when it worked it was slow. We shouldn’t go down this path, comfortable as it may be. The host GL knows the hardware better, can do things like thunk out of an emulator if need be and will work on systems where Vulkan is not available.
Fwiw as far as wined3d-gl is concerned, it can play nice with slow bounce buffers too. It should do the right thing if GL_ARB_buffer_storage is not available. d3d isn't as badly affected by the performance penalty, although there are games that profit from persistent maps.
I think Aida mentioned this path was prohibitively slow for WineD3D, not sure which games they were referring to.
We can function without ARB_buffer_storage, and we won't use glMapBuffer() for uploads in that case, but we still use it for maps, which is often important.
Am 14.10.24 um 20:46 schrieb Elizabeth Figura:
On Sunday, 13 October 2024 10:16:47 CDT Derek Lesho wrote:
- "Just Use Zink": this idea has been floated for a while, and would be
to use a PE build of Zink, the MESA gallium driver on top of Vulkan, which would then automatically make use of our VK_EXT_map_memory_placed integration in winevulkan and bypass the problem. Rémi has a branch with a draft solution for this [1] The advantage of approach is that it doesn't require any more extensions to any more APIs, but the disadvantage is that Wine would have to worry about keeping a separate version of Mesa up to date and support for building the required c++ components of mesa to its build system, as can be seen in the commits.
I don't think Zink is a viable option.
First and foremost, the range of GPU hardware out there that should be reasonably supported is not all Vulkan-capable at this point. (Even Mesa still supports GPU hardware well below the Vulkan feature requirements).
Right, although if the alternative is a new GL extension, I do wonder how long it would take mesa to implement it for the drivers of said old hardware.
I think we would need to take the initiative on that one, at least in the drivers where we have the ability.
Also, as Stefan mentioned, its stability and performance are well below what they should be in order to avoid functional regressions. While these may be solvable in the long term (although I'm a bit concerned about stability), I do think it means we can't rely on it. Distributions and corporate consumers alike are chomping at the bit to delete 32-bit support, and that means that we need to provide a smooth transition without any regressions.
- GL extension with placed memory allocation callback: In this case,
Wine provides a map and unmap callback for the GL implementation to use in creating the pages it needs for GPU-mappings. In comparison to the PE Zink solution, we can continue to use system libraries, and maintain fast buffer IO, as long as the glMapBuffer implementation returns the mapped ptr directly. The main downside for this solution is of course the introduction of a new extension to the mostly dormant GL API, but here it would be possible to just use a VK_EXT_map_memory leveraging Zink like in the first solution, only this time on the Unix side.
For this solution I've created a draft: Wine MR [2] Mesa Branch with Zink implementation [3]
If this is going to require explicit use of Zink on the Unix side, I don't think it's feasible either, unfortunately, for the same reasons.
There's nothing about my draft that inherently restricts it to Zink fwiw, it's just one entry point that allows wine to allocate pages for allocations. I just implemented it in Zink first as a proof of concept.
Sorry about that, I misunderstood what you said—I thought you meant that we would be effectively writing a Wine-internal extension and then using Zink on the Unix side to implement it (which, granted, would be probably a better option than PE Zink, all else aside). I see you rather meant that Zink would be using VK_EXT_map_memory_placed to implement the GL extension.
Actually looking more closely at your implementation, it seems like a reasonable proposal, I don't think I foresee any fundamental problems with it.
[It would be especially nice if we can avoid the marshalling. If I'm not mistaken, we're not actually doing anything in the relevant parts of NtAllocateVirtualMemory() that requires the TEB, and, as controversial as it may be, I'd propose that we could simply stay on the thread we're called on. Of course, this is a bit of premature optimization...]
- glMapBuffer extension which we send our placed allocation: The idea
here is that slightly extended glMapBuffer could be sent a flag to use Wine mapping, avoiding callbacks and targeting the problem where it manifests for Wine (when we get an address out of 32-bit space). This was briefly discussed on the LGD discord server, but it won't work well, due to the fact that a GL buffer is usually only a suballocation of a memory mapping, and often has already been assigned to another pool of memory by the time glMapMemory is called. Mesa would have to add a considerable sized implementation setting up custom-allocator pools, and be able to move buffers between them to implement this.
- glBufferStorage extension to ease Mesa implementation: The next
logical conclusion based on the problems of the last solution is that the Mesa implementation should get the information about any custom allocation at buffer creation time. The closest equivalent to this is the glBufferStorage entry, where we could create a new type of memory that we would ask the implementation to put the buffer in. For this solution we'd either have to implement custom memory allocator or introduce more API entries in order to allow the driver to convey to the application what size mappings it prefers. This would be very unwieldy, and wouldn't solve speed up the slow buffer copies for legacy buffers which don't use ARB_buffer_storage.
I don't think the "legacy buffers" part is a problem. We can explicitly call glBufferStorage() if the application doesn't. (IIRC it's legal to call it multiple times, so we can call it once on creation with the relevant flag, and then append the flag if it's called again later.)
As far as I understand it it's not legal to call glBufferData after glBufferStorage, but yeah we could maybe make an exception in the extension. FWIW I actually have an incomplete test branch implementing a path like this (just without the legacy buffer part), and my impression was, it's definitely feasible, but a lot more driver code than a basic "driver please use my allocator" solution.
Ah right, glBufferStorage() makes the buffer immutable...
Of course, we could simply use a separate vector for this. But as you mention this approach has other problems.
As for suballocation, my best proposal is that instead of specifying an exact address, Wine would simply specify that the address needs to be below the 4GB boundary. I don't remember if there was a reason to allow a specific address on the Vulkan side, but we don't need it in Wine, and I don't know that a new GL extension needs to be quite as forward-looking at this point.
The problem is the way to fulfill this requirement on Linux is a bit problematic. MAP_32bit will always only allocated in the first 2GB of address space, and there's no way to change this in the case of LAA.
It could be solved with more plumbing, but yeah, no reason to go down this route if we have a simpler option available.
Am 14.10.2024 um 22:38 schrieb Derek Lesho dlesho@codeweavers.com:
The dosemu2 dev pointed out a way to achieve something similar to macos' mach_vm_remap on Linux. I have to find my email in the archive and will forward his suggestion. It does sound somewhat hacky to me, I am not sure if we want to use it.
Oh interesting, yeah that be cool to see. Regardless, even if we have that path, we would probably want to also have a proper long term solution, like how we have the host external memory backup hack in winevulkan.
I found the old emails (or part of them). I doubt the idea is workable, but here we go:
The trick is essentially to replace the high address mapping behind the external library's back. In pseudo code
void *wine_glMapBuffer(int bo) { struct buffer *buf = some_lookup(bo) int fd = shmmem_alloc(buf->size);
void *high_addr = host_glMapBuffer(buf->host_bo); void *low_addr = wine_find_some_4g_space(size);
mmap(high_addr, buf->size, PROT_RW, MAP_FIXED, fd, 0); mmap(low_addr, buf->size, PROT_RW, MAP_FIXED, fd, 0);
return low_addr; }
This has a non-zero chance of working if the GL library just reads/writes to its own allocation via the CPU. Replacing the mapping will probably destroy any magic properties (like memory mapped video memory) the driver put in place. And it will break if the pointer returned isn't page aligned and maybe comes from a larger allocation. and and and. So I doubt it is going to fly.
In the dosemu2 code there's some sweet looking code here: https://github.com/dosemu2/dosemu2/blob/92705ab35e136bfef2e0206af7c05518ae27... . It looks like it maps void *source to void *target. I tried to follow it through the rest of the code a bit to see if the callers expect *source to remain valid, but couldn't easily figure out how this is all used.
---
Fwiw as far as wined3d-gl is concerned, it can play nice with slow bounce buffers too. It should do the right thing if GL_ARB_buffer_storage is not available. d3d isn't as badly affected by the performance penalty, although there are games that profit from persistent maps.
I think Aida mentioned this path was prohibitively slow for WineD3D, not sure which games they were referring to.
There are certainly games that are either slow (d3d10/11) or outright broken (d3d8/9) without coherent buffers. d3d8/9 games can break without coherent maps if they pass incorrect map lengths. Coherent maps allow us to ignore the length and let the driver figure out what was actually written.
I did come across a reverse side of this though [0]: A game that experienced a big regression on ARM hardware *with* coherent buffers. The background was that x86 and that particular ARM chip had entirely different ideas how slow reading from write-combined buffers is.