On 24 November 2013 00:18, Stefan Dösinger stefan@codeweavers.com wrote:
The implementation is strange and fairly ineffective in doing what PBOs are supposed to do - avoiding pipeline stalls and redundant copying. Removing it will make restructuring surfaces easier and avoid temporary regressions. It will be re-added later.
Is there a good reason this can't just be fixed first?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 2013-11-24 11:30, schrieb Henri Verbeet:
On 24 November 2013 00:18, Stefan Dösinger stefan@codeweavers.com wrote:
The implementation is strange and fairly ineffective in doing what PBOs are supposed to do - avoiding pipeline stalls and redundant copying. Removing it will make restructuring surfaces easier and avoid temporary regressions. It will be re-added later.
Is there a good reason this can't just be fixed first?
It's certainly possible, but it comes with a greater risk of regressions and I estimate that it will require a week of duplicated work. The current surface location management will go away and I prefer to migrate to the volume-style location management at the lowest possible complexity instead of spending time fixing the old code first.
On 24 November 2013 23:45, Stefan Dösinger stefandoesinger@gmail.com wrote:
Am 2013-11-24 11:30, schrieb Henri Verbeet:
Is there a good reason this can't just be fixed first?
It's certainly possible, but it comes with a greater risk of regressions and I estimate that it will require a week of duplicated
That shouldn't be true, I think.
work. The current surface location management will go away and I prefer to migrate to the volume-style location management at the lowest possible complexity instead of spending time fixing the old code first.
I'm not quite sure what you mean here, the volume and surface resource location management functions should be pretty much the same. The exceptions to that are of course things like PBOs and perhaps DCs, but fixing that is pretty much what this is about. I think it's also worth pointing out that the way the volume code uses PBOs isn't a whole lot better than what the surface code does.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 2013-11-25 12:02, schrieb Henri Verbeet:
On 24 November 2013 23:45, Stefan Dösinger stefandoesinger@gmail.com wrote:
Am 2013-11-24 11:30, schrieb Henri Verbeet:
Is there a good reason this can't just be fixed first?
It's certainly possible, but it comes with a greater risk of regressions and I estimate that it will require a week of duplicated
That shouldn't be true, I think.
I'll give it another try and see where it leads me.
I've attached the current state of the surface patches for reference. Patches 2 and 26 are simpler without PBOs, and I'll have to duplicate patches 27-29, 33, 41 and 50 in surface.c and then delete or move the code in separate patches.
I'd appreciate it if you could take a look at the entire patchset and see if there's anything else you don't like conceptually or if there's another major reordering you want.
I think it's also worth pointing out that the way the volume code uses PBOs isn't a whole lot better than what the surface code does.
How would you prefer PBOs to work?
My patchset replaces the xFLAG_PBO flags with a map_binding field and adds resource_get_memory similar to buffer_get_memory. This will also take care of user memory and dib sections.
Eventually we want to use PBOs for sysmem surfaces and volumes so the driver doesn't have to create an extra data copy in UpdateTexture / UpdateSurface. This requires ARB_buffer_storage, and I don't think it makes sense to implement this before buffer, surface and volume location and GL BO handling is merged.
On 25 November 2013 17:01, Stefan Dösinger stefandoesinger@gmail.com wrote:
I've attached the current state of the surface patches for reference. Patches 2 and 26 are simpler without PBOs, and I'll have to duplicate patches 27-29, 33, 41 and 50 in surface.c and then delete or move the code in separate patches.
I'd appreciate it if you could take a look at the entire patchset and see if there's anything else you don't like conceptually or if there's another major reordering you want.
Ok, but it may take a while.
I think it's also worth pointing out that the way the volume code uses PBOs isn't a whole lot better than what the surface code does.
How would you prefer PBOs to work?
I haven't quite decided yet. One consideration is that if we want the usage hint to glBufferDataARB() to make sense, we can't necessarily use the same PBO both for downloading and uploading data. Another is essentially the various variants of d3d10's UpdateSubResource(), and d3d9's GetRenderTargetData() / GetFrontBufferData(). We'd obviously like to avoid stalling on those, but it's not clear to me if that's always possible. For example, always creating PBO's for sysmem surfaces and then uploading from there probably isn't going to work very well if the application just creates a single sysmem surface and uses that for updating different default pool surfaces with UpdateSurface(), unless perhaps it also uses DISCARD on the sysmem surface. Perhaps that's ok, but I don't think we really know at this point. At some point I considered a scheme where the device has a pool of PBOs and the resource update function grab one from there when needed. That probably has its own problems, but I haven't thought it all the way through either.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 2013-11-25 18:21, schrieb Henri Verbeet:
I haven't quite decided yet. One consideration is that if we want the usage hint to glBufferDataARB() to make sense, we can't necessarily use the same PBO both for downloading and uploading data.
The usage hint is gone from GL_ARB_buffer_storage. Do you really think it matters? Do we have any evidence that suggests that using one PBO to download and upload data causes problems? Even if it does cause problems we should check if it causes more problems than using one surface to download and upload in d3d9.
You know the AMD GPU and r600g driver pretty well. Is there anything in the hw or driver that suggests that the usage really matters and/or that using a PBO for PACK_BUFFER and UNPACK_BUFFER causes problems?
Another is essentially the various variants of d3d10's UpdateSubResource(), and d3d9's GetRenderTargetData() / GetFrontBufferData(). We'd obviously like to avoid stalling on those, but it's not clear to me if that's always possible. For example, always creating PBO's for sysmem surfaces and then uploading from there probably isn't going to work very well if the application just creates a single sysmem surface and uses that for updating different default pool surfaces with UpdateSurface(), unless perhaps it also uses DISCARD on the sysmem surface. Perhaps that's ok, but I don't think we really know at this point.
Most applications that use UpdateSurface / UpdateTexture that I've worked with in the command Stream development use DISCARD on the sysmem surface. I'd expect the application to stall on native d3d as well if it doesn't use DISCARD, but it is difficult to test.
The exception here are intro videos. Apps do all sorts of stupid things, including just using a D3DPOOL_MANAGED texture and calling it a day. It doesn't really matter because pure video playback doesn't queue up lots of commands anyway.
The one thing that matters performance-wise is not stalling the pipeline on map. I have not seen a single application that uses D3DPOOL_DEFAULT D3DUSAGE_DYNAMIC textures with DISCARD, except for one case of intro video playback. Not sure which game this was, I'd have to re-check them all. Even the applications that use UpdateSurface / UpdateTexture in real rendering don't stream a lot of data compared to the use of dynamic buffers, usually it's a texture update every other frame.
According to my testing, the only applications that profit from PBOs(in their current state, and in my cs tree) are Warcraft 3 and UT2004. They abuse backbuffer->map(READONLY) as a glFinish replacement, and loading the data into a PBO instead of sysmem gives them a ~10% performance boost.
At some point I considered a scheme where the device has a pool of PBOs and the resource update function grab one from there when needed. That probably has its own problems, but I haven't thought it all the way through either.
I plan to add a BO pool to the command stream patches to handle DISCARD, but I think that's only partially related to what you've considered.
On 25 November 2013 19:23, Stefan Dösinger stefandoesinger@gmail.com wrote:
You know the AMD GPU and r600g driver pretty well. Is there anything in the hw or driver that suggests that the usage really matters and/or that using a PBO for PACK_BUFFER and UNPACK_BUFFER causes problems?
The Mesa Gallium drivers just care about the STATIC/DYNAMIC/DRAW part of the usage hint, and not so much about READ/DRAW/COPY. I can't say if that will necessarily always stay that way though. I know the proprietary NVIDIA driver does at least complain through ARB_debug_output if the detected usage doesn't match the usage hint.
On 25 November 2013 17:01, Stefan Dösinger stefandoesinger@gmail.com wrote:
I've attached the current state of the surface patches for reference. Patches 2 and 26 are simpler without PBOs, and I'll have to duplicate patches 27-29, 33, 41 and 50 in surface.c and then delete or move the code in separate patches.
I'd appreciate it if you could take a look at the entire patchset and see if there's anything else you don't like conceptually or if there's another major reordering you want.
I reviewed about half of this, and only very roughly: - In patch 05/72 you remove DIB maps. Similar to this patch, I'd prefer fixing things in an incremental way, instead of essentially rewriting the entire thing. - In patch 14/72 you introduce wined3d_resource_invalidate_location(). It's not clear to me that this makes sense for container resources like textures in that form. The way I was going to handle that was to create a distinction between resources (textures, buffers) and subresources (surfaces, volumes, buffers). - I don't like patch 15/72 at all. I think this should pretty much just be a matter of doing "wined3d_resource_invalidate_location(subresource->container);" in the subresource. - The context restoration in patch 18-20/72 is pretty nasty. We should probably introduce a variant of context_release() that takes the wined3d context to restore as a second parameter instead.
As a general theme, I think the better approach is to first unify how things work between e.g. surfaces and volumes, and then merge them on the resource/subresource level, as opposed to first adding duplicate functionality to resources that only works for specific resource types.