Ben allowed me to forward this email.
Kudos to him for all the knowledge his brainbox contains!
-------- Original Message -------- Subject: Re: Fix catalyst brain damage to speed up Falcon BMS 2x Date: Sat, 16 Feb 2013 11:20:35 -0500 From: Ben Supnik bsupnik@xsquawkbox.net To: Stanislaw Halik sthalik@misaki.pl
Hi Guys,
I'm afraid I don't know enough about the _specific_ situation you guys are seeing. I can tell you guys a few things from my GL work:
1. The ATI OpenGL Linux team is pretty accessible; do you guys have anyone in the fglrx beta program?
2. What we found was that for stream-draw buffers that need to be orphaned, mapped, unmapped and drawn, there was a fixed overhead in the ATI drivers compared to NV; this 'performance gap' is cross-platform - both NV and ATI use the same GL stack (more or less) for Windows and Linux, and we saw the slow-down on both.
3. We originally were using map buffer (not map buffer range) with a NULL glBufferData to "orphan" the buffer (the equivalent of d3d map-discard). I think I tried MBR and it didn't fix it - both were expensive because the fundamental memory mapping operation was slow.
4. The slowness was in milliseconds, e.g. "this hits our fps by 20% or 30%" - but it wasn't "this is 3x slower because it stalled the GPU." So if you're seeing truly face-meltingly bad performance, like a total pipeline stall, you have a different bug.
5. As a general statement, the original glMapBuffer is subject to a lot of heuristic behavior in the drivers; app developers are very fast and loose with how they use it, so the driver vendors tend to try to make it do the fastest, most useful, least crash-y thing because the apps use it like monkeys on type-writers. By comparison, MBR came out later and has much more specific semantics for particular optimizations, as a result, the MBR implementation will often do exactly what you say, _even_ if it's slower. Getting even one flag wrong in MBR can cause it to hit a face-meltingly slow path.
We worked around the perf cost of mapping a buffer on ATI hw by using pinned memory (but we do have a Linux-only bug where we get corrupt geometry with pinned memory - it works on Windows); I have some todo items to investigate the problem more thoroughly now.
Cheers Ben
On 2/16/13 5:47 AM, Stanislaw Halik wrote:
On 2013-02-16 09:04, Stefan Dösinger wrote:
What you really want to do is figure out why GL_ARB_map_buffer_range is slow on fglrx, and make sure that the problem is really fglrx specific. I fixed a number of dynamic buffer performance problems in the past months, but there are still problems if we're falling back to draw_strided_slow for some reason, like fixed function material tracking.
Thanks for reviewing this.
Going to ask Ben Supnik from Laminar Research (X-Plane developer) and BCC him, since he has apparently run into the same issue. There's much info of fglrx woes (not really Linux specific, either) on http://developer.x-plane.com/
He said publicly to be in contact with AMD themselves, and been friendly to OSS by releasing an X-Plane Linux version, as well as overall cool fellow.
Ben, Please help!
Other than being wrong conceptually, you're disabling dynamic buffers the wrong way: The "proper" way would be to add a quirk to the quirk_table in directx.c that removes ARB_map_buffer_range from the list of supported extensions if the driver vendor is AMD.
Like this? Patch attached.
I've run into hard GPU hangs with fglrx 13.2, no VT switch either. This helps:
[Software\Wine\Direct3D] "DirectDrawRenderer"="gdi" "Multisampling"="disabled" "OffscreenRenderingMode"="fbo" "UseGLSL"="enabled"
Lack of GLSL disables HDR apparently.
Without GDI, there's some nasty display corruption on FBOs.
Also Catalyst likes to hang display when switching from 3D to 2D and VT switch helps.
But with all this busywork, performance is near-native. Catalyst at least supports indirect addressing (whatever that means) and doesn't choke on > 128 temps... FYI Mesa bug submitted:
https://bugs.freedesktop.org/show_bug.cgi?id=55420
-sh