Re: [PATCH 08/10] wined3d: make wined3d_device_update_surface operate with wined3d_texture and sub_resource_idx
On 19 October 2015 at 19:07, Riccardo Bortolato <rikyz619(a)gmail.com> wrote:
@@ -3774,12 +3764,25 @@ float CDECL wined3d_device_get_npatch_mode(const struct wined3d_device *device) /* FIXME: Callers should probably use wined3d_device_update_sub_resource() * instead. */ HRESULT CDECL wined3d_device_update_surface(struct wined3d_device *device, - struct wined3d_surface *src_surface, const RECT *src_rect, - struct wined3d_surface *dst_surface, const POINT *dst_point) + struct wined3d_texture *src_texture, unsigned int src_idx, const RECT *src_rect, + struct wined3d_texture *dst_texture, unsigned int dst_idx, const POINT *dst_point) You should probably use wined3d_device_update_sub_resource() instead.
As I said to Stefan on IRC I wanted to look into "wined3d_device_update_sub_resource" but due to its design I would have had to modify it in order to avoid multiple castings texture-resource etc. Also its usage feels somehow different? (at least to my uneducated wined3d eye) Ciao, Riccardo 2015-10-20 16:37 GMT+02:00 Henri Verbeet <hverbeet(a)gmail.com>:
On 19 October 2015 at 19:07, Riccardo Bortolato <rikyz619(a)gmail.com> wrote:
@@ -3774,12 +3764,25 @@ float CDECL wined3d_device_get_npatch_mode(const struct wined3d_device *device) /* FIXME: Callers should probably use wined3d_device_update_sub_resource() * instead. */ HRESULT CDECL wined3d_device_update_surface(struct wined3d_device *device, - struct wined3d_surface *src_surface, const RECT *src_rect, - struct wined3d_surface *dst_surface, const POINT *dst_point) + struct wined3d_texture *src_texture, unsigned int src_idx, const RECT *src_rect, + struct wined3d_texture *dst_texture, unsigned int dst_idx, const POINT *dst_point) You should probably use wined3d_device_update_sub_resource() instead.
On 20 October 2015 at 16:44, Riccardo Bortolato <rikyz619(a)gmail.com> wrote:
As I said to Stefan on IRC I wanted to look into "wined3d_device_update_sub_resource" but due to its design I would have had to modify it in order to avoid multiple castings texture-resource etc. Also its usage feels somehow different? (at least to my uneducated wined3d eye)
It's not necessarily trivial, no. The easiest way to start may be to first work on implementing wined3d_device_update_surface() on top of wined3d_device_update_sub_resource(), and only then work on getting rid of wined3d_device_update_surface() itself.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Am 2015-10-20 um 17:36 schrieb Henri Verbeet:
It's not necessarily trivial, no. The easiest way to start may be to first work on implementing wined3d_device_update_surface() on top of wined3d_device_update_sub_resource(), and only then work on getting rid of wined3d_device_update_surface() itself. The inconvenient thing about wined3d_device_update_sub_resource is that the source data is in a pointer and not a system memory resource. You can of course just map the source resource in the client library and pass the map pointer to wined3d_device_update_sub_resource, but that'll require copying the data or waiting with the multithreaded command stream.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWJmy0AAoJEN0/YqbEcdMwxvEP/A1UImuOQ98s71kXhtuqynVj GPmhgjWWjAJHcgsC/7GIHKAqL6GGbHpmJNAfGBQgNIp4YxUl9CgkGol7GWcOCdoZ yoM6lCf3suYRV1225eeCOpNEDXtj6dYowqQIXBlSP8wT7MDzX2/6p0szbnZIIONf i0rlo/2EEffwwkML9VV7v0C/fTo1j4DrACUtNdMnUXY6gVIGxLlYZujbAKiqNct2 IUBsKYmqKxuI5nyxO6qaqHP4bFf7MCvXSIHKDa7rz2VEDyhG+s1OhCx3mINdXkpl 1rlfE65lcB11x0a2PFULkEtgFB9cXjICaKnNX3LkMSAc7NCXQj3po3WLmU3gF00b q1eCuRM6QwzosbzGJVoUykjMuHEApI1uvD0eTVbI1K/z+mEexyFR5lj8DFHhvwou qfsmeQxxvci+hd+eQEhDK2JM/l89LRqiZObsMrljUNSBa5yDWFtryH5arWtizFxy c5exVJpVJ1o4bzHhXvKtgzlKLsBDNm8bcQqOVog9MX6LJBnNyBpfBXYK5iHzHCas TfIA35KoSHCeC/HqIiWNLdfahYqFs5Z8eGfufk2gcVpqRFqQvEih70TGzubo8Dk0 Y7HvJZ2JV+k6WEMp0N7PgGUWiCdS2GUGhN5ppk/u40Zowcus4BykRusWqSrB4OKa K5KoEAenluePen+sN6he =cvx3 -----END PGP SIGNATURE-----
On 20 October 2015 at 18:32, Stefan Dösinger <stefandoesinger(a)gmail.com> wrote:
Am 2015-10-20 um 17:36 schrieb Henri Verbeet:
It's not necessarily trivial, no. The easiest way to start may be to first work on implementing wined3d_device_update_surface() on top of wined3d_device_update_sub_resource(), and only then work on getting rid of wined3d_device_update_surface() itself. The inconvenient thing about wined3d_device_update_sub_resource is that the source data is in a pointer and not a system memory resource. You can of course just map the source resource in the client library and pass the map pointer to wined3d_device_update_sub_resource, but that'll require copying the data or waiting with the multithreaded command stream.
The source surface for wined3d_device_update_surface() is supposed to be in WINED3D_POOL_SYSTEM_MEM. Mapping those shouldn't require a copy. (And while in ddraw everything is a bit weird, in d3d9 system memory surfaces are really supposed to be in system memory.)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Am 2015-10-20 um 19:26 schrieb Henri Verbeet: > The source surface for wined3d_device_update_surface() is supposed > to be in WINED3D_POOL_SYSTEM_MEM. Mapping those shouldn't require a > copy. (And while in ddraw everything is a bit weird, in d3d9 system > memory surfaces are really supposed to be in system memory.) - From wined3d_device_update_sub_resource's point of view there's no guarantee that the memory data points to won't be gone after its return. wined3d_device_update_surface knows that the source surface will be there when the queued operation is executed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWJ1OuAAoJEN0/YqbEcdMwwS4P/3pAXKzdgO7cKneT+ZNAc/xv eCi9MU5diUnp3AuZJ+DUWqg6hPfdHJfJZ1QjRCM2Ii3MkxvP40rlN17D1rjgOiXe UNPKSBuhldn+JLhMhPk7PsMMq6JZs/uf431rdeAsSch4LeNRVDYwGXu+S9QPPVsT gp+VpCm5q23Oao/PxW0e8S9JdyDLhCPJKELiXDLxPSwVAjVpHYswoQ8MtnkiVpmX Va2hQY0tw9XEvvClBnOxcWl62hiii7U400hkZIa7vS0m3oQ24blqe783wCZ8ZY3y CKLzTgt4SnUPKkxcGV7Mrzzjs/nmDPjPz5xOoXF+I0t4r5eaDd83rCswhzgC27Sh Xiihu82W2DNtzVZ33Io9hPBTltl3cDkreM2Y4w5GjNkVV+kEO4hhMpFwR/kfYMMs lWdGoDnTp0CADHLmo7X1A2vShhiOrm8/dfAKo07/X5pzc6N2m5WkPQ5YAjZsSTYr vpgmgLjyF1sGGh1CeDKHGQoODKooCujVTbCWIExNI2+nciEUUdWtllPz8fwNKdiZ 3iPnYkep0u44fGboPWehAE7bVoj7ufXV8jEenAax1uexTPKeYN6+v9FSHWzksHhp fVYAVV37eG19Qq+cPsO+wDt14//7VsTcTv6yNghcXzqVIB2UdBCSpp2BqpDFHg7M 2u/9NSCQSw309m7MvYSo =xoOl -----END PGP SIGNATURE-----
On 21 October 2015 at 10:58, Stefan Dösinger <stefandoesinger(a)gmail.com> wrote: > Am 2015-10-20 um 19:26 schrieb Henri Verbeet: >> The source surface for wined3d_device_update_surface() is supposed >> to be in WINED3D_POOL_SYSTEM_MEM. Mapping those shouldn't require a >> copy. (And while in ddraw everything is a bit weird, in d3d9 system >> memory surfaces are really supposed to be in system memory.) > - From wined3d_device_update_sub_resource's point of view there's no > guarantee that the memory data points to won't be gone after its > return. wined3d_device_update_surface knows that the source surface > will be there when the queued operation is executed. > Oh right, you meant csmt for both cases. I guess the "or" was a bit ambiguous. It's something csmt will have to deal with either way though, since d3d10+ uses that. Of course you can't necessarily know a system memory surface isn't going to be modified before the upload finishes in such a case either, but I suppose you could block on maps in that case. Note that I don't think it's a given that resource updates should always happen on the rendering thread though.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Am 2015-10-21 um 11:38 schrieb Henri Verbeet:
Oh right, you meant csmt for both cases. I guess the "or" was a bit ambiguous. It's something csmt will have to deal with either way though, since d3d10+ uses that. The current solution in my patches is to wait until the upload is done. That makes the tests happy, but obviously isn't an efficient solution.
My suggestion would be to map the d3d10+ call to the d3d9-style call with a sysmem staging resource managed by d3d11, similarly to how we handle UP draws in <= d3d9. But I haven't looked into the details at all. (On a semi-related topic, OpenGL doesn't know that that the sysmem pointer isn't going away and needs to make a copy even if we have a sysmem d3d resource. The idea here is to use a buffer with client storage (ARB_buffer_storage) to tell GL that it doesn't have to bother copying but still having a stable map address).
Of course you can't necessarily know a system memory surface isn't going to be modified before the upload finishes in such a case either, but I suppose you could block on maps in that case. Yeah, right now I handle that in the same way as I handle buffers, textures and render targets in regular draws. I know two applications that ran into performance problems with update_surface or update_texture. One of them is World of Warcraft, and it passes D3DLOCK_DISCARD when it maps the sysmem surface. Implementing discards for sysmem resources fixed the issue.
The other one is Call of Duty Modern Warfare, and it doesn't use D3DLOCK_DISCARD. It is updating a pretty small volume. I suspect it is using it as a way to throttle the render-ahead behaviour, but I have no proof of that.
Note that I don't think it's a given that resource updates should always happen on the rendering thread though. Yeah, but that doesn't help here. E.g. the destination resource could be busy, in which case an update in the main thread can't be done right away either. And no, we can't discard single sub resources.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWJ2GsAAoJEN0/YqbEcdMw6jMP/iwu8dhSlxAsKHWFQx6KoTfJ 10KENScmDTJ75xfhn8LXd6+Uxoq/9sHa4x1JoEpuK/vlSt/qseAXCFQSYNmj63EE GIenCmhI/y4CdK7JKHv7+o3W1tcKGuK41VqBxe905rbzj7TFQP3d8xxT65qcbMlT TiatvFoLgWa8+wjXcgKkRa0kzWozTieuri2fYlpXhScmoXqQxFjq+ao315ecS3FC j/dDZHQSMh6IEJ9/TrC1XbhTd3Y1xykB377BORUbD6VBcjtqE60QOQzv5kdtDp0h UhSzFkiL5d4Z0pJ3arW9wfFhflJaTCZbzS0w9f63kYq/KgO+SHCQDp4f8wRYNKZM CCi3zhInUTOZ+FlrKejJwx8xp0lpjO2XRwDq+qJKOKuEADxxhlcvHGXpnQ0G4Mi+ +YhtTcMbnzllTAx1C4/PjKtQa5aQ5g20dJH0M2i51kKoY+08p48Ly3wFctQYGnUD kkTH4EMCd7Lp4aaJxk74owIYXCidmtQkFZjVCiuldhZ+J+hDZNGhJeT8Vyi1V/CT jEwefYkV0XZtK1KMb1faVEtp5OwPjymOyHFg4HR9xqLybH6JFuBPltJmKXA6eOvy OKlD/R7gXcurMuzE86BB0ukTmkq0CSuC5Zo97Wf3Ep4FoA6bbKeXaMbyxUmgdRVH i6CLZmj8TPP5/m5L+lbk =+gvV -----END PGP SIGNATURE-----
On 21 October 2015 at 11:58, Stefan Dösinger <stefandoesinger(a)gmail.com> wrote:
Am 2015-10-21 um 11:38 schrieb Henri Verbeet:
Oh right, you meant csmt for both cases. I guess the "or" was a bit ambiguous. It's something csmt will have to deal with either way though, since d3d10+ uses that. The current solution in my patches is to wait until the upload is done. That makes the tests happy, but obviously isn't an efficient solution.
My suggestion would be to map the d3d10+ call to the d3d9-style call with a sysmem staging resource managed by d3d11, similarly to how we handle UP draws in <= d3d9. But I haven't looked into the details at all.
Complexities of managing that aside, that would always imply at least one copy of the data for d3d10+ updates.
Note that I don't think it's a given that resource updates should always happen on the rendering thread though. Yeah, but that doesn't help here. E.g. the destination resource could be busy, in which case an update in the main thread can't be done right away either. Well yes, but the destination resource being busy generally implies the GPU being somewhat busy too. And in that case it isn't necessarily bad to spend some time on the CPU to copy the data.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Am 2015-10-21 um 12:13 schrieb Henri Verbeet: I completely missed CopySubresourceRegion. This seems like a much better method to map UpdateSurface to than UpdateSubresource.
Complexities of managing that aside, that would always imply at least one copy of the data for d3d10+ updates. Ya, the design of d3d10's update_sub_resource kinda implies a copy. You can of course try to call GL right away with the application-provided pointer, but then the driver is likely to make a copy internally unless it is really clever too. It seems better to me to do the copy ourselves, write into a client storage buffer object and then blame the driver if it still insists on making an internal copy.
Well yes, but the destination resource being busy generally implies the GPU being somewhat busy too. And in that case it isn't necessarily bad to spend some time on the CPU to copy the data. Probably. Could also be because the worker thread is busy because the driver needs more CPU time than the game logic.
Either way, these things are fairly hypothetical at this point. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWJ2+fAAoJEN0/YqbEcdMwy2kP/0YakW8Nn9A3pyrSpFwXXq4n +otk3t4L3B9JQ1elnVkoGpkJDyqXxOimbOLyb80Zrbc0Ll02R0FNmoFgJHRDUDhG dZnnTUwHoXxpQgSf+T/TpJwj7pVGqyaH8+A434l31iQDW+KInjeH6X9T0lakuQaU BzIXUs/Niz2oy8IaZXKY3Ar/8ku//ZKwmN2c4BAlhaznl3SLzgTxEn1d5zAfFFWt VbSb96U5DBRdEpz9EO1X7/5zzRM1m/9CUDowuOIGX8G/wxFI9P6DA9x2iUqBcvTU kkXmZU2av9thjxqjXX9lPMfmuR/0vZskdSGqHgDqUvOhkkTV25rlhEgjbkAbjs7z rLOaojZF0x/rcglkCqlE5MRZ1k7HXt3pGhwxDs3lhkJiuVk4w4d5QYJ06DpICwc7 OgX9QjtnAWlgW8xvBeFx3y9tSierqCW+J+AtzhdAxOiYBxdB1/2AG2TBZiQpoWtl azutl8RUBQ7Sodu4xSsh7AOMiTeOP8mPXaoxuNQYcLVHPh4laRzTppR5jwYUABTI A2lL10z0SCAFF5O4V3mg29x0gwLS4qQbS2Kt27/Xk0bzoHs62Q9qLCgpRf9OEONa QlTYK+BijSn5P3s8BrZaQBmk4OsgNYNuktVIVB/a+7aUOn0jj+RbPjYHj1JNb/OW +TdDO9TPH04MRMa4lpoX =FhGW -----END PGP SIGNATURE-----
On 21 October 2015 at 12:57, Stefan Dösinger <stefandoesinger(a)gmail.com> wrote:
I completely missed CopySubresourceRegion. This seems like a much better method to map UpdateSurface to than UpdateSubresource.
At first sight, yes. But it's actually not that obvious. Note that in d3d10+ you don't have pure system memory resources like in earlier versions, in d3d9 terms resources are always in the default or managed pool. The way it's described in the documentation CopySubresourceRegion() always copies between GPU buffers, while UpdateSubresource() is for uploads. I had hoped to avoid thinking too much about how to handle system memory resources for a while. That said, perhaps wined3d_device_copy_sub_resource_region() could be a useful intermediate step, even though that means we may end up changing it to UpdateSubresource() anyway in the future.
Complexities of managing that aside, that would always imply at least one copy of the data for d3d10+ updates. Ya, the design of d3d10's update_sub_resource kinda implies a copy. You can of course try to call GL right away with the application-provided pointer, but then the driver is likely to make a copy internally unless it is really clever too. It seems better to me to do the copy ourselves, write into a client storage buffer object and then blame the driver if it still insists on making an internal copy.
Yeah, we disagree there. I prefer to do the right thing and then fix the driver if it doesn't know how to handle that.
I tried taking a different road here, please check out: https://source.winehq.org/patches/data/115596 Ciao, Riccardo 2015-10-22 10:47 GMT+02:00 Henri Verbeet <hverbeet(a)gmail.com>:
On 21 October 2015 at 12:57, Stefan Dösinger <stefandoesinger(a)gmail.com> wrote:
I completely missed CopySubresourceRegion. This seems like a much better method to map UpdateSurface to than UpdateSubresource.
At first sight, yes. But it's actually not that obvious. Note that in d3d10+ you don't have pure system memory resources like in earlier versions, in d3d9 terms resources are always in the default or managed pool. The way it's described in the documentation CopySubresourceRegion() always copies between GPU buffers, while UpdateSubresource() is for uploads. I had hoped to avoid thinking too much about how to handle system memory resources for a while. That said, perhaps wined3d_device_copy_sub_resource_region() could be a useful intermediate step, even though that means we may end up changing it to UpdateSubresource() anyway in the future.
Complexities of managing that aside, that would always imply at least one copy of the data for d3d10+ updates. Ya, the design of d3d10's update_sub_resource kinda implies a copy. You can of course try to call GL right away with the application-provided pointer, but then the driver is likely to make a copy internally unless it is really clever too. It seems better to me to do the copy ourselves, write into a client storage buffer object and then blame the driver if it still insists on making an internal copy.
Yeah, we disagree there. I prefer to do the right thing and then fix the driver if it doesn't know how to handle that.
participants (3)
-
Henri Verbeet -
Riccardo Bortolato -
Stefan Dösinger