2009/12/15 Stefan Dösinger stefan@codeweavers.com:
@@ -85,6 +85,21 @@ static void buffer_create_buffer_object(struct wined3d_buffer *This)
...
This->maps = HeapAlloc(GetProcessHeap(), 0, sizeof(*This->maps));
This is the wrong place for that. Either do it in buffer_init(), or in buffer_Map(). Keeping track of maps should also be in a separate patch, and you should integrate "lock_count".
+struct map_range
A "wined3d_" prefix wouldn't hurt.
- LONG maps_size;
Does a signed variable really make sense here?
- /* TODO: GL_ARB_map_buffer_range */
- return gl_info->supported[APPLE_FLUSH_BUFFER_RANGE];
As a general rule, I think it makes sense to add support for ARB extensions (when available) before vendor specific extensions.
Am 16.12.2009 um 11:33 schrieb Henri Verbeet:
2009/12/15 Stefan Dösinger stefan@codeweavers.com:
@@ -85,6 +85,21 @@ static void buffer_create_buffer_object(struct wined3d_buffer *This)
...
This->maps = HeapAlloc(GetProcessHeap(), 0, sizeof(*This->maps));
This is the wrong place for that. Either do it in buffer_init(), or in buffer_Map(). Keeping track of maps should also be in a separate patch, and you should integrate "lock_count".
No, I think its the right place because tracking the maps only makes sense with a VBO, and if we have a dynamic VBO we have to keep track of the maps(even with ARB). So I think its just right to allocate and free this together with the VBO.
- /* TODO: GL_ARB_map_buffer_range */
- return gl_info->supported[APPLE_FLUSH_BUFFER_RANGE];
As a general rule, I think it makes sense to add support for ARB extensions (when available) before vendor specific extensions.
Well, I still don't have a working Linux install on any of my machines that can run Left 4 Dead at decent speed(the only game I know that really profits from the dynamic buffers), and OSX doesn't support the ARB extension
So this comes from a personal computer setup issue. Doesn't make the APPLE extension code any different though.
2009/12/16 Stefan Dösinger stefan@codeweavers.com:
Am 16.12.2009 um 11:33 schrieb Henri Verbeet:
2009/12/15 Stefan Dösinger stefan@codeweavers.com:
@@ -85,6 +85,21 @@ static void buffer_create_buffer_object(struct wined3d_buffer *This)
...
- This->maps = HeapAlloc(GetProcessHeap(), 0, sizeof(*This->maps));
This is the wrong place for that. Either do it in buffer_init(), or in buffer_Map(). Keeping track of maps should also be in a separate patch, and you should integrate "lock_count".
No, I think its the right place because tracking the maps only makes sense with a VBO, and if we have a dynamic VBO we have to keep track of the maps(even with ARB). So I think its just right to allocate and free this together with the VBO.
What you're doing (or should be doing) is extending the "lock_count/dirty_start/dirty_end" scheme to something more detailed. I don't think using two different schemes depending on whether the buffer is dynamic or not is a good idea.
Am 16.12.2009 um 12:50 schrieb Henri Verbeet:
What you're doing (or should be doing) is extending the "lock_count/dirty_start/dirty_end" scheme to something more detailed. I don't think using two different schemes depending on whether the buffer is dynamic or not is a good idea.
Sounds great on paper, but there's an issue. The dirty_start, dirty_end and the maps array serve two similar, yet different purposes.
The new maps array works like a stack. I needed this because I have to tell GL to flush the buffer after the app is done writing to it, but only Map() knows the bytes modified. After Unmap() I don't bother about what I just flushed, its GLs business now.
dirty_start and dirty_end are needed in PreLoad. Using the above stack doesn't work as-is. Consider a buffer with 10 bytes:
Map(buffer, start=1, len=1);
Map(buffer, start=5, len=1); Unmap() Map(Buffer, start=8, len=1); Unmap()
Unmap()
By the time I'm done unmaping, the information that byte 5 was mapped and possibly modified is lost in the stack, but PreLoad has to load at least bytes 1, 5 and 8. The current dirty_start and dirty_end loads bytes 1, 2, 3, 4, 5, 6, 7, 8. This is not as efficient as it could be, but I don't think its a problem because those are static buffers anyway.
The best way I see to merge those two tracking mechanisms is to organize them as a list, and upload all modified ranges in PreLoad(), or if the buffer is single buffered and dynamic, flush all the ranges after the last unmap, and clear the list in both PreLoad and unmap. That way we can share the data structure, although it doesn't perfectly fit either purpose. We can also share some of the code. The downside is that GL doesn't start flushing until the final unmap on dynamic buffers.
2009/12/16 Stefan Dösinger stefan@codeweavers.com:
The best way I see to merge those two tracking mechanisms is to organize them as a list, and upload all modified ranges in PreLoad(), or if the buffer is single buffered and dynamic, flush all the ranges after the last unmap, and clear the list in both PreLoad and unmap. That way we can share the data structure, although it doesn't perfectly fit either purpose. We can also share some of the code. The downside is that GL doesn't start flushing until the final unmap on dynamic buffers.
A list should work, but it doesn't have to be a single list. You can keep a list for the map stack (you can also store it as an array of pointers to the ranges, not sure if that gains you anything though), an on unmap you either free the range (or put it in a free list or device range pool, whatever) or transfer it to the list you keep for PreLoad(). You can also potentially merge and/or sort the second list. As a first step you could always merge the second list into a single entry, and end up with something that behaves pretty much like the current code.
Am 16.12.2009 um 13:44 schrieb Henri Verbeet:
2009/12/16 Stefan Dösinger stefan@codeweavers.com:
The best way I see to merge those two tracking mechanisms is to organize them as a list, and upload all modified ranges in PreLoad(), or if the buffer is single buffered and dynamic, flush all the ranges after the last unmap, and clear the list in both PreLoad and unmap. That way we can share the data structure, although it doesn't perfectly fit either purpose. We can also share some of the code. The downside is that GL doesn't start flushing until the final unmap on dynamic buffers.
A list should work, but it doesn't have to be a single list.
Actually, since we're not going to Unmap-flush and PreLoad() the buffer(its either-or), I can use the same data structure for the stack and list(or well, dynamic array). I just have to watch out not to remove elements in any place in unlock that affects DOUBLEBUFFER buffers.
Here is what I have so far. Completely untested, and the flushing is not yet integrated.
2009/12/16 Stefan Dösinger stefan@codeweavers.com:
Actually, since we're not going to Unmap-flush and PreLoad() the buffer(its either-or), I can use the same data structure for the stack and list(or well, dynamic array). I just have to watch out not to remove elements in any place in unlock that affects DOUBLEBUFFER buffers.
Sounds fragile, but I'm willing to wait and see what the final patch looks like.
Here is what I have so far. Completely untested, and the flushing is not yet integrated.
Yeah, it has some issues :-)
For what it's worth, I think I prefer offset/size over start/len, but I guess that's completely subjective.
Hi,
Here's a new patch for review. I won't send it until Monday because I want to do some performance testing(see below), and because the ddraw patch is queued below that, but it needs more testing.
The changes are essentially some bugfixing(one NULL pointer exception), and I dropped the optimization for full buffer maps for now. Oh, and some helper functions, but beyond that it's the same.
Am 16.12.2009 um 14:39 schrieb Henri Verbeet:
2009/12/16 Stefan Dösinger stefan@codeweavers.com:
Actually, since we're not going to Unmap-flush and PreLoad() the buffer(its either-or), I can use the same data structure for the stack and list(or well, dynamic array). I just have to watch out not to remove elements in any place in unlock that affects DOUBLEBUFFER buffers.
Sounds fragile, but I'm willing to wait and see what the final patch looks like.
I'm not doing that for now, I want to test if there's any performance difference. If I am using that, it might collide with range merges, or catching locks of the whole buffer(and then not recording any more partial locks). I'll try to find a game where this matters and benchmark it, but I think the difference will be too small to notice it.