On 19 October 2015 at 19:07, Riccardo Bortolato rikyz619@gmail.com wrote:
@@ -3774,12 +3764,25 @@ float CDECL wined3d_device_get_npatch_mode(const struct wined3d_device *device) /* FIXME: Callers should probably use wined3d_device_update_sub_resource()
- instead. */
HRESULT CDECL wined3d_device_update_surface(struct wined3d_device *device,
struct wined3d_surface *src_surface, const RECT *src_rect,
struct wined3d_surface *dst_surface, const POINT *dst_point)
struct wined3d_texture *src_texture, unsigned int src_idx, const RECT *src_rect,
struct wined3d_texture *dst_texture, unsigned int dst_idx, const POINT *dst_point)
You should probably use wined3d_device_update_sub_resource() instead.
As I said to Stefan on IRC I wanted to look into "wined3d_device_update_sub_resource" but due to its design I would have had to modify it in order to avoid multiple castings texture-resource etc. Also its usage feels somehow different? (at least to my uneducated wined3d eye)
Ciao, Riccardo
2015-10-20 16:37 GMT+02:00 Henri Verbeet hverbeet@gmail.com:
On 19 October 2015 at 19:07, Riccardo Bortolato rikyz619@gmail.com wrote:
@@ -3774,12 +3764,25 @@ float CDECL wined3d_device_get_npatch_mode(const struct wined3d_device *device) /* FIXME: Callers should probably use wined3d_device_update_sub_resource()
- instead. */
HRESULT CDECL wined3d_device_update_surface(struct wined3d_device *device,
struct wined3d_surface *src_surface, const RECT *src_rect,
struct wined3d_surface *dst_surface, const POINT *dst_point)
struct wined3d_texture *src_texture, unsigned int src_idx, const RECT *src_rect,
struct wined3d_texture *dst_texture, unsigned int dst_idx, const POINT *dst_point)
You should probably use wined3d_device_update_sub_resource() instead.
On 20 October 2015 at 16:44, Riccardo Bortolato rikyz619@gmail.com wrote:
As I said to Stefan on IRC I wanted to look into "wined3d_device_update_sub_resource" but due to its design I would have had to modify it in order to avoid multiple castings texture-resource etc. Also its usage feels somehow different? (at least to my uneducated wined3d eye)
It's not necessarily trivial, no. The easiest way to start may be to first work on implementing wined3d_device_update_surface() on top of wined3d_device_update_sub_resource(), and only then work on getting rid of wined3d_device_update_surface() itself.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Am 2015-10-20 um 17:36 schrieb Henri Verbeet:
It's not necessarily trivial, no. The easiest way to start may be to first work on implementing wined3d_device_update_surface() on top of wined3d_device_update_sub_resource(), and only then work on getting rid of wined3d_device_update_surface() itself.
The inconvenient thing about wined3d_device_update_sub_resource is that the source data is in a pointer and not a system memory resource. You can of course just map the source resource in the client library and pass the map pointer to wined3d_device_update_sub_resource, but that'll require copying the data or waiting with the multithreaded command stream.
On 20 October 2015 at 18:32, Stefan Dösinger stefandoesinger@gmail.com wrote:
Am 2015-10-20 um 17:36 schrieb Henri Verbeet:
It's not necessarily trivial, no. The easiest way to start may be to first work on implementing wined3d_device_update_surface() on top of wined3d_device_update_sub_resource(), and only then work on getting rid of wined3d_device_update_surface() itself.
The inconvenient thing about wined3d_device_update_sub_resource is that the source data is in a pointer and not a system memory resource. You can of course just map the source resource in the client library and pass the map pointer to wined3d_device_update_sub_resource, but that'll require copying the data or waiting with the multithreaded command stream.
The source surface for wined3d_device_update_surface() is supposed to be in WINED3D_POOL_SYSTEM_MEM. Mapping those shouldn't require a copy. (And while in ddraw everything is a bit weird, in d3d9 system memory surfaces are really supposed to be in system memory.)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Am 2015-10-20 um 19:26 schrieb Henri Verbeet:
The source surface for wined3d_device_update_surface() is supposed to be in WINED3D_POOL_SYSTEM_MEM. Mapping those shouldn't require a copy. (And while in ddraw everything is a bit weird, in d3d9 system memory surfaces are really supposed to be in system memory.)
- From wined3d_device_update_sub_resource's point of view there's no guarantee that the memory data points to won't be gone after its return. wined3d_device_update_surface knows that the source surface will be there when the queued operation is executed.
On 21 October 2015 at 10:58, Stefan Dösinger stefandoesinger@gmail.com wrote:
Am 2015-10-20 um 19:26 schrieb Henri Verbeet:
The source surface for wined3d_device_update_surface() is supposed to be in WINED3D_POOL_SYSTEM_MEM. Mapping those shouldn't require a copy. (And while in ddraw everything is a bit weird, in d3d9 system memory surfaces are really supposed to be in system memory.)
- From wined3d_device_update_sub_resource's point of view there's no
guarantee that the memory data points to won't be gone after its return. wined3d_device_update_surface knows that the source surface will be there when the queued operation is executed.
Oh right, you meant csmt for both cases. I guess the "or" was a bit ambiguous. It's something csmt will have to deal with either way though, since d3d10+ uses that. Of course you can't necessarily know a system memory surface isn't going to be modified before the upload finishes in such a case either, but I suppose you could block on maps in that case. Note that I don't think it's a given that resource updates should always happen on the rendering thread though.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Am 2015-10-21 um 11:38 schrieb Henri Verbeet:
Oh right, you meant csmt for both cases. I guess the "or" was a bit ambiguous. It's something csmt will have to deal with either way though, since d3d10+ uses that.
The current solution in my patches is to wait until the upload is done. That makes the tests happy, but obviously isn't an efficient solution.
My suggestion would be to map the d3d10+ call to the d3d9-style call with a sysmem staging resource managed by d3d11, similarly to how we handle UP draws in <= d3d9. But I haven't looked into the details at all.
(On a semi-related topic, OpenGL doesn't know that that the sysmem pointer isn't going away and needs to make a copy even if we have a sysmem d3d resource. The idea here is to use a buffer with client storage (ARB_buffer_storage) to tell GL that it doesn't have to bother copying but still having a stable map address).
Of course you can't necessarily know a system memory surface isn't going to be modified before the upload finishes in such a case either, but I suppose you could block on maps in that case.
Yeah, right now I handle that in the same way as I handle buffers, textures and render targets in regular draws. I know two applications that ran into performance problems with update_surface or update_texture. One of them is World of Warcraft, and it passes D3DLOCK_DISCARD when it maps the sysmem surface. Implementing discards for sysmem resources fixed the issue.
The other one is Call of Duty Modern Warfare, and it doesn't use D3DLOCK_DISCARD. It is updating a pretty small volume. I suspect it is using it as a way to throttle the render-ahead behaviour, but I have no proof of that.
Note that I don't think it's a given that resource updates should always happen on the rendering thread though.
Yeah, but that doesn't help here. E.g. the destination resource could be busy, in which case an update in the main thread can't be done right away either. And no, we can't discard single sub resources.
On 21 October 2015 at 11:58, Stefan Dösinger stefandoesinger@gmail.com wrote:
Am 2015-10-21 um 11:38 schrieb Henri Verbeet:
Oh right, you meant csmt for both cases. I guess the "or" was a bit ambiguous. It's something csmt will have to deal with either way though, since d3d10+ uses that.
The current solution in my patches is to wait until the upload is done. That makes the tests happy, but obviously isn't an efficient solution.
My suggestion would be to map the d3d10+ call to the d3d9-style call with a sysmem staging resource managed by d3d11, similarly to how we handle UP draws in <= d3d9. But I haven't looked into the details at all.
Complexities of managing that aside, that would always imply at least one copy of the data for d3d10+ updates.
Note that I don't think it's a given that resource updates should always happen on the rendering thread though.
Yeah, but that doesn't help here. E.g. the destination resource could be busy, in which case an update in the main thread can't be done right away either.
Well yes, but the destination resource being busy generally implies the GPU being somewhat busy too. And in that case it isn't necessarily bad to spend some time on the CPU to copy the data.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Am 2015-10-21 um 12:13 schrieb Henri Verbeet:
I completely missed CopySubresourceRegion. This seems like a much better method to map UpdateSurface to than UpdateSubresource.
Complexities of managing that aside, that would always imply at least one copy of the data for d3d10+ updates.
Ya, the design of d3d10's update_sub_resource kinda implies a copy. You can of course try to call GL right away with the application-provided pointer, but then the driver is likely to make a copy internally unless it is really clever too. It seems better to me to do the copy ourselves, write into a client storage buffer object and then blame the driver if it still insists on making an internal copy.
Well yes, but the destination resource being busy generally implies the GPU being somewhat busy too. And in that case it isn't necessarily bad to spend some time on the CPU to copy the data.
Probably. Could also be because the worker thread is busy because the driver needs more CPU time than the game logic.
Either way, these things are fairly hypothetical at this point.
On 21 October 2015 at 12:57, Stefan Dösinger stefandoesinger@gmail.com wrote:
I completely missed CopySubresourceRegion. This seems like a much better method to map UpdateSurface to than UpdateSubresource.
At first sight, yes. But it's actually not that obvious. Note that in d3d10+ you don't have pure system memory resources like in earlier versions, in d3d9 terms resources are always in the default or managed pool. The way it's described in the documentation CopySubresourceRegion() always copies between GPU buffers, while UpdateSubresource() is for uploads. I had hoped to avoid thinking too much about how to handle system memory resources for a while. That said, perhaps wined3d_device_copy_sub_resource_region() could be a useful intermediate step, even though that means we may end up changing it to UpdateSubresource() anyway in the future.
Complexities of managing that aside, that would always imply at least one copy of the data for d3d10+ updates.
Ya, the design of d3d10's update_sub_resource kinda implies a copy. You can of course try to call GL right away with the application-provided pointer, but then the driver is likely to make a copy internally unless it is really clever too. It seems better to me to do the copy ourselves, write into a client storage buffer object and then blame the driver if it still insists on making an internal copy.
Yeah, we disagree there. I prefer to do the right thing and then fix the driver if it doesn't know how to handle that.
I tried taking a different road here, please check out: https://source.winehq.org/patches/data/115596
Ciao, Riccardo
2015-10-22 10:47 GMT+02:00 Henri Verbeet hverbeet@gmail.com:
On 21 October 2015 at 12:57, Stefan Dösinger stefandoesinger@gmail.com wrote:
I completely missed CopySubresourceRegion. This seems like a much better method to map UpdateSurface to than UpdateSubresource.
At first sight, yes. But it's actually not that obvious. Note that in d3d10+ you don't have pure system memory resources like in earlier versions, in d3d9 terms resources are always in the default or managed pool. The way it's described in the documentation CopySubresourceRegion() always copies between GPU buffers, while UpdateSubresource() is for uploads. I had hoped to avoid thinking too much about how to handle system memory resources for a while. That said, perhaps wined3d_device_copy_sub_resource_region() could be a useful intermediate step, even though that means we may end up changing it to UpdateSubresource() anyway in the future.
Complexities of managing that aside, that would always imply at least one copy of the data for d3d10+ updates.
Ya, the design of d3d10's update_sub_resource kinda implies a copy. You can of course try to call GL right away with the application-provided pointer, but then the driver is likely to make a copy internally unless it is really clever too. It seems better to me to do the copy ourselves, write into a client storage buffer object and then blame the driver if it still insists on making an internal copy.
Yeah, we disagree there. I prefer to do the right thing and then fix the driver if it doesn't know how to handle that.