Hi Masanori,
Thank you for your patches! I have a few comments:
Regarding the big picture I think it would be great if we had a scheme that also worked for partially mapping resources. It's also necessary to make the D3DLOCK_NO_DIRTY_UPDATE test on managed textures work. However, doing this is a lot more work than your patchset and complicates location management a lot.
If Henri tells you something different follow his advice. He is the authority for decisions in d3d :-) .
Patch 1: Am 2017-04-17 um 15:47 schrieb Masanori Kakura:
+struct wined3d_dirty_regions +{
- UINT left;
- UINT top;
- UINT right;
- UINT bottom;
- UINT front;
- UINT back;
- struct wined3d_dirty_regions *next;
+};
A standard wine list would be better IMHO.
Patch 2:
diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h index 785105d7b9..711f98a0a6 100644 --- a/dlls/wined3d/wined3d_private.h +++ b/dlls/wined3d/wined3d_private.h @@ -2765,6 +2765,27 @@ struct wined3d_dirty_regions struct wined3d_dirty_regions *next; };
+static inline BOOL wined3d_dirty_regions_add(struct wined3d_dirty_regions **regions,
UINT left, UINT top, UINT right, UINT bottom, UINT front, UINT back)
I don't think we want this in the header file, I think it'd be better placed in resource.c. Also note that buffers already have a related mechanism (buffer->modified_areas).
You can predeclare the functions in the header if you need to call them from multiple files.
+{
- struct wined3d_dirty_regions *new_region;
- if (!(new_region = HeapAlloc(GetProcessHeap(), 0, sizeof(struct wined3d_dirty_regions))))
return FALSE;
HeapAlloc is expensive. I think it is OK for the start, but in the long run a way to keep some old dirty region elements around and reuse them instead of always calling Alloc / Free.
Patch 5:
- if (!wined3d_dirty_regions_add(&resource->dirty_regions, 0, 0, width, height, 0, depth))
Why not use wined3d_resource_maximize_dirty_region here?
wined3d_resource_acquire(resource);
- wined3d_dirty_regions_clear(&resource->dirty_regions); wined3d_cs_destroy_object(resource->device->cs, wined3d_resource_destroy_object, resource);
Regions need to be cleared from inside the cs thread in case there are pending update_texture commands.
Similarly patch 6 needs to put the region modifying into wined3d_cs_exec_add_dirty_texture_region.
Patch 7:
if (box->left < box->right && box->top < box->bottom && box->front < box->back)
{
You don't need to check the box, texture_resource_sub_resource_map already does it before it reaches this place of the code. See wined3d_texture_check_box_dimensions.
Patch 8: I am not sure if this is necessary at all. In d3d9 it doesn't matter because a UpdateSurface destination can never be an UpdateTexture source.
Patch 9/10: These should be merged, and I don't think patch 10 looks right. I think dirty regions should be honored on sublevels as well, just adjusted (i.e., divide by 2 for every level).
- if (src_texture->resource.width <= dst_texture->resource.width
&& src_texture->resource.height <= dst_texture->resource.height
&& src_texture->resource.depth <= dst_texture->resource.depth)
Why are you adding this check? Is it because the src_skip_levels broke the if (i == 0) root level detection? If you apply the dirty region to all levels this problem should go away.
FWIW I broadly agree with Stefan's comments and I have another small one.
2017-04-17 18:14 GMT+02:00 Stefan Dösinger stefandoesinger@gmail.com:
Patch 1: Am 2017-04-17 um 15:47 schrieb Masanori Kakura:
+struct wined3d_dirty_regions +{
- UINT left;
- UINT top;
- UINT right;
- UINT bottom;
- UINT front;
- UINT back;
- struct wined3d_dirty_regions *next;
+};
A standard wine list would be better IMHO.
Yeah. Have a look at include/wine/list.h, also you have many examples of its use in wined3d already.
In addition, embedding a struct wined3d_box instead of the individual fields would probably be nicer. Similarly for passing the extents around e.g. as function arguments in patch 2.
I haven't really looked at the patches in detail but the general approach seems fine. Thank you for your work!
On 17 April 2017 at 22:44, Matteo Bruni matteo.mystral@gmail.com wrote:
2017-04-17 18:14 GMT+02:00 Stefan Dösinger stefandoesinger@gmail.com:
A standard wine list would be better IMHO.
Yeah. Have a look at include/wine/list.h, also you have many examples of its use in wined3d already.
Why not an array? Stefan already mentioned wined3d_buffer.maps, in a fashion.
I haven't really looked at the patches in detail but the general approach seems fine. Thank you for your work!
There are certainly some details to work out, like e.g. the first four patches adding dead code, but in terms of general approach, these are the things I think need resolving first: - How does this interact with the location management? (I.e., wined3d_texture_invalidate_location(), etc.) For example, in 132746, ~resource->map_binding is invalidated and dirty regions are added. How should wined3d_texture_load_location() interpret that when loading for example WINED3D_LOCATION_DRAWABLE? - Does it really make sense to track regions on the level of struct wined3d_resource instead of e.g. strut wined3d_texture? If it does, that would make wined3d_buffer.maps redundant.
Why not an array? Stefan already mentioned wined3d_buffer.maps, in a fashion.
Right, an array is probably just better.
- Does it really make sense to track regions on the level of struct
wined3d_resource instead of e.g. strut wined3d_texture? If it does, that would make wined3d_buffer.maps redundant.
A unified structure to track both buffer and texture updates would be nice, if that works out okay in practice.
Either way, I just realized that something like this can be potentially used to accelerate (partial) texture updates with CSMT, similarly to e.g. buffers updates in Stefan's patchset. In general it makes texture uploads resemble buffers, which seems like a good thing.
I think that's no surprise to you. Any plan to explore this idea through?
On 18 April 2017 at 18:25, Matteo Bruni matteo.mystral@gmail.com wrote:
Why not an array? Stefan already mentioned wined3d_buffer.maps, in a fashion.
Right, an array is probably just better.
- Does it really make sense to track regions on the level of struct
wined3d_resource instead of e.g. strut wined3d_texture? If it does, that would make wined3d_buffer.maps redundant.
A unified structure to track both buffer and texture updates would be nice, if that works out okay in practice.
Sure. The wined3d_box structure is a little larger than needed for buffers though, and in case of buffers it's used largely internally, so I think an argument could be made for keeping the texture interface separate (but consistent) in this case. But regardless of what decision is made there, I think it would be undesirable for buffers to have dirty regions in both struct wined3d_resource and struct wined3d_buffer.
Either way, I just realized that something like this can be potentially used to accelerate (partial) texture updates with CSMT, similarly to e.g. buffers updates in Stefan's patchset. In general it makes texture uploads resemble buffers, which seems like a good thing.
I think that's no surprise to you. Any plan to explore this idea through?
No concrete plan.
Am 2017-04-18 um 19:44 schrieb Henri Verbeet:
On 18 April 2017 at 18:25, Matteo Bruni matteo.mystral@gmail.com wrote:
Either way, I just realized that something like this can be potentially used to accelerate (partial) texture updates with CSMT, similarly to e.g. buffers updates in Stefan's patchset. In general it makes texture uploads resemble buffers, which seems like a good thing.
I think that's no surprise to you. Any plan to explore this idea through?
No concrete plan.
One thing that made matters easier in buffers is that we only had two locations to deal with. The sysmem location would always have the entire content and the buffer location may need a partial or complete update. With textures you have multiple locations. You could e.g. render to the RGB texture, map a partial rectangle, unmap it, and then need it all loaded into the sRGB texture. And you can make it even more complicated if you want to.
I was thinking about handling it with a WINED3D_LOCATION_FRAGMENTED location that stores a "base" location (in my example LOCATION_TEXTURE_RGB) and a "diff" location (LOCATION_BUFFER in my example). The dirty box indicates which area of diff needs to be transferred to base. In a partial map we'd set base = old location, diff = new location and if necessary transfer the mapped area from base to diff. Then we'd set location=LOCATION_FRAGMENTED and invalidate all other locations. If we're later loading the base location again we only need to copy the dirty rect from diff to base. Afterwards we invalidate LOCATION_FRAGMENTED [*]. If we load a different location (even the diff location[*]) we'd first de-fragment it in the same way and then load the requested location from the now-valid base location.
[*]: These two things could be made a bit more flexible, but I am not sure if it is worth the trouble. In most cases we'll only have to worry about LOCATION_TEXTURE_RGB and one of LOCATION_BUFFER/LOCATION_SYSMEM/LOCATION_USERPTR. I don't think there are many applications that map two different subboxes and expect it to be fast.
I think at some point we'll want to use GL_ARB_buffer_storage for both textures and buffers to speed up CSMT maps and a few other goodies. Merging GL BO handling of textures and buffers would make this a lot easier.
This got sidetracked quite a bit already and I'm going to make it worse. Sorry...
2017-04-18 21:31 GMT+02:00 Stefan Dösinger stefandoesinger@gmail.com:
Am 2017-04-18 um 19:44 schrieb Henri Verbeet:
On 18 April 2017 at 18:25, Matteo Bruni matteo.mystral@gmail.com wrote:
Either way, I just realized that something like this can be potentially used to accelerate (partial) texture updates with CSMT, similarly to e.g. buffers updates in Stefan's patchset. In general it makes texture uploads resemble buffers, which seems like a good thing.
I think that's no surprise to you. Any plan to explore this idea through?
No concrete plan.
One thing that made matters easier in buffers is that we only had two locations to deal with. The sysmem location would always have the entire content and the buffer location may need a partial or complete update. With textures you have multiple locations. You could e.g. render to the RGB texture, map a partial rectangle, unmap it, and then need it all loaded into the sRGB texture. And you can make it even more complicated if you want to.
Although now with UAV the SYSMEM location isn't necessarily the most up to date for buffers either and if EXT_texture_sRGB_decode is supported we only use one TEXTURE location. Anyway, I agree that with textures it can generally get more complicated.
I was thinking about handling it with a WINED3D_LOCATION_FRAGMENTED location that stores a "base" location (in my example LOCATION_TEXTURE_RGB) and a "diff" location (LOCATION_BUFFER in my example). The dirty box indicates which area of diff needs to be transferred to base. In a partial map we'd set base = old location, diff = new location and if necessary transfer the mapped area from base to diff. Then we'd set location=LOCATION_FRAGMENTED and invalidate all other locations. If we're later loading the base location again we only need to copy the dirty rect from diff to base. Afterwards we invalidate LOCATION_FRAGMENTED [*]. If we load a different location (even the diff location[*]) we'd first de-fragment it in the same way and then load the requested location from the now-valid base location.
[*]: These two things could be made a bit more flexible, but I am not sure if it is worth the trouble. In most cases we'll only have to worry about LOCATION_TEXTURE_RGB and one of LOCATION_BUFFER/LOCATION_SYSMEM/LOCATION_USERPTR. I don't think there are many applications that map two different subboxes and expect it to be fast.
Maybe that's already too flexible. I'd start by only "cloning" the SYSMEM -> BUFFER uploading mechanism from buffers (but for SYSMEM -> TEXTURE) and see where we stand at that point. Yes, it's not particularly generic / orthogonal / nice but it probably is the most important case we care about fast partial texture updates.
I think at some point we'll want to use GL_ARB_buffer_storage for both textures and buffers to speed up CSMT maps and a few other goodies. Merging GL BO handling of textures and buffers would make this a lot easier.
I assume you specifically mean persistent maps. Those would be interesting to try out for sure, although it's probably quite some work and it should wait for the "old" buffer mapping optimizations to come in CSMT first.
Am 2017-04-19 um 00:33 schrieb Matteo Bruni:
This got sidetracked quite a bit already and I'm going to make it worse. Sorry...
Yeah, apologies to Masanori :-\ . He stung into a hive's nest without knowing.
Although now with UAV the SYSMEM location isn't necessarily the most up to date for buffers either
That's why I used the past tense: "we only had two locations to deal with" :-) .
and if EXT_texture_sRGB_decode is supported we only use one TEXTURE location. Anyway, I agree that with textures it can generally get more complicated.
Yup. Except that you still have to find a way not to break rendering without render-to-fbo (When drawable != texture) or no sRGB_decode (when texture_srgb != texture_rgb). You can do it by just disabling partial maps.
Maybe that's already too flexible. I'd start by only "cloning" the SYSMEM -> BUFFER uploading mechanism from buffers (but for SYSMEM -> TEXTURE) and see where we stand at that point. Yes, it's not particularly generic / orthogonal / nice but it probably is the most important case we care about fast partial texture updates.
Well, I think sysmem <-> texture and buffer <-> texture are about equally important (managed textures, dynamic default textures, to think in d3d9 terms). We don't need to do both at the same time on the same resource and we don't need to worry about sysmem <-> buffer.
I assume you specifically mean persistent maps. Those would be interesting to try out for sure, although it's probably quite some work and it should wait for the "old" buffer mapping optimizations to come in CSMT first.
Yeah, it would not only allow us to do maps that don't need data transfer from the main thread, it'd also give us advantages of buffers with constant map addresses. Though I am not sure if we can use it for e.g. managed buffers, as a persistent map may cause a system memory or GART fallback.
I was also once hoping that CLIENT_STORAGE_BIT would allow us to create a GL BO around HeapAlloc'ed memory for sysmem textures (to improve UpdateTexture by telling GL that the source pointer will remain valid after glSubTexImage returns), but it turns out it's just one of those "silly hint things" that don't actually guarantee any behavior. Because we're talking about sysmem textures we still need to keep the map pointer constant across device::reset calls, so making GL own the data is a no-go. Having a double copy and transfering sysmem -> buffer -> texture is pointless.
On 19 April 2017 at 00:33, Matteo Bruni matteo.mystral@gmail.com wrote:
This got sidetracked quite a bit already and I'm going to make it worse. Sorry...
For the most part I think it was fairly on topic, although I'm hoping for some input from Masanori as well.
I will say that a generic WINED3D_LOCATION_FRAGMENTED doesn't sound that appealing to me, in part because the issue should really only apply to maps. We did have something similar between the texture and the drawable of depth/stencil surfaces without "AlwaysOffscreen", but thankfully that's gone.