Re: Helping Wine use 64 bit Mesa OGL drivers for 32-bit Windows applications - wine-devel

19 Oct 2024

      On 10/19/24 00:47, Faith Ekstrand wrote:
...
The timing here isn't great, unfortunately. I'd love to contribute 
more to the discussion but I'm going on leave starting next week until 
mid-Febuary so I won't be able to participate much until then. I'll 
try to leave a few thoughts, though.
Thanks for the comments! I'll actually also be quite busy until then 
(Uni), but maybe by February they'll be a consensus on the path forward 😀.
...
On Fri, Oct 18, 2024 at 5:10 PM Derek Lesho dlesho@codeweavers.com 
wrote:
Hey everyone 👋,

I'm Derek from the Wine project, and wanted to start a discussion
with
y'all about potentially extending the Mesa OGL drivers to help us
with a
functionality gap we're facing.

Problem Space:

In the last few years Wine's support for running 32-bit windows
apps in
a 64-bit host environment (wow64) has almost reached feature
completion,
but there remains a pain point with OpenGL applications: Namely that
Wine can't return a 64-bit GL implementation's buffer mappings to
a 32
bit application when the address is outside of the 32-bit range.

Currently, we have a workaround that will copy any changes to the
mapping back to the host upon glBufferUnmap, but this of course is
slow
when the implementation directly returns mapped memory, and
doesn't work
for GL_PERSISTENT_BIT, where directly mapped memory is required.

A few years ago we also faced this problem with Vulkan's, which was
solved through the VK_EXT_map_memory_placed extension Faith drafted,
allowing us to use our Wine-internal allocator to provide the
pages the
driver maps to. I'm now wondering if an GL equivalent would also
be seen
as feasible amongst the devs here.

Proposed solution:

As the GL backend handles host mapping in its own code, only giving
suballocations from its mappings back to the App, the problem is a
little bit less straight forward in comparison to our Vulkan
solution:
If we just allowed the application to set its own placed mapping when
calling glMapBuffer, the driver might then have to handle moving
buffers
out of already mapped ranges, and would lose control over its own
memory
management schemes.

Therefore, I propose a GL extension that allows the GL client to
provide
a mapping and unmapping callback to the implementation, to be used
whenever the driver needs to perform such operations. This way the
driver remains in full control of its memory management affairs,
and the
amount of work for an implementation as well as potential for bugs is
kept minimal. I've written a draft implementation in Zink using
map_memory_placed [1] and a corresponding Wine MR utilizing it
[2], and
would be curious to hear your thoughts. I don't have experience in
the
Mesa codebase, so I apologize if the branch is a tad messy.

It's an interesting approach, to be sure. I don't mean that as a bad 
or good thing as I haven't given this enough thought with GL in mind 
to have a better, more well thought out plan.
The most obvious issue that jumps out to me is that we really want 
that callback to be set before anyone ever maps a buffer that might 
possibly get exposed to the client and we want it to never change.  If 
this were Vulkan, we'd have you provide it at vkCreateDevice() time.  
But this is GL where everybody loves a big mutable state object. If we 
do go with callbacks (and it's still not 100% clear to me what the 
right choice is), we'd want them to be somehow set-once and set before 
any buffers are created.  I'm not 100% sure how you'd spec that or how 
we'd enforce it.  There may be some precedent for this somewhere in GL 
(no_error, maybe?) but I'm not sure.
Right, in the case of Zink I was just lucky it doesn't happen to map 
anything upon context creation. If I understand what you mean by using a 
no_error like approach correctly, I think that should definitely work, I 
think we would then just somewhat-awkwardly want to pass through the 
callback address as two context attributes, one for the lower and one 
for the higher part of the address.
However, even if we find a way to make the mapping callback's global on 
the context level, that then still requires from the driver that they 
relegate mappings in these contexts to dedicated memory pools. This 
might be desired, in order to allow other GL clients in the process to 
continue using the full 64-bit address space, although in practice I 
don't think Wine uses any libraries that create GL contexts (*other than 
potentially GStreamer which we are moving away from).
If we just want to keep it simple and workaround the GL context, maybe 
Wine could just export its allocator in ntdll.so in a way that where 
Mesa could then directly call it when present. That way we could even 
avoid the need for a GL extension that will at the end of the day 
probably only be used by Wine, plus avoid the awkwardness of not wanting 
to map anything in the driver until we get mapping callback.
...
The other question that comes to mind is when exactly we'd be expected 
to use these things. Obviously, we need to do so for any map that may 
be exposed to the client. However, it's not always easy to do that 
because you don't know at buffer create time whether or not it will be 
persistently mapped.  A driver is likely to have all sorts of internal 
mappings for things and, while those can come from one of those 
ranges, it'll burn more of that precious 32-bit address space than 
needed. This gets worse when you take sub-allocation into account. If 
we're okay with all buffer mappings going down the client-request path 
then it's probably okay. The driver just might need an extra bit in 
its buffer cache key.
Yeah, it would definitely be a nice bonus to the new path if drivers 
kept their internal mappings outside of 32-bit address space, but as far 
as I can see I don't think this affects our choice of interface, as mesa 
should be able to handle client visible buffers differently from 
driver-internal ones without the help of the client.
...
I'm also sitting here trying to come up with some plan that would let 
us do this more like Vulkan and I'm having trouble coming up with one 
that works. GL has no concept of "create time". We could theoretically 
do something where we flush everything, copy the data to a new 
placed-mappable buffer and then continue on but that's gonna suck.
Yeah, and then you also still have to define how you create this new 
placed-mappable buffer, and there isn't really a small-neat solution for 
this. You could define a GL_OUT_OF_PLACED_MEMORY error in glMapBuffer, 
and then have another entry point by which the app feeds the driver 
pages, but then you run into the problem of unmapping, which as far as I 
can see is often performed asynchronously.
...
I think that's all that comes to mind immediately. As I said at the 
top, I'm happy to talk more in a few months. Best of luck until then!
~Faith
In theory, the only requirement from drivers from the extension
would be
that glMapBuffer always return a pointer from within a page allocated
through the provided callbacks, so that it can be guaranteed to be
positioned within the required address space. Wine would then use
it's
existing workaround for other types of buffers, but as Mesa seems to
often return directly mapped buffers in other cases as well, Wine
could
also avoid the slowdown that comes with copying in these cases as
well.

Why not use Zink?:

There's also a proposal to use a 32-bit PE build of Zink in Wine
bypassing the need for an extension; I brought this to discussion in
this Wine-Devel thread last week [3], which has some arguments
against
this approach.

For cases where Zink is being used on the host (this is the current 
plan for Nouveau going forward), doing Zink in Windows may not be a 
bad idea. However, I agree that it may not be the best idea to rely on 
that plan.
If any of you have thoughts, concerns, or questions about this
potential
approach, please let me know, thanks!

1:
https://gitlab.freedesktop.org/Guy1524/mesa/-/commits/placed_allocation

2: https://gitlab.winehq.org/wine/wine/-/merge_requests/6663

3: https://marc.info/?t=172883260300002&r=1&w=2
<https://marc.info/?t=172883260300002&r=1&w=2>