[PATCH v2 0/1] MR5978: winegstreamer: Optimise copy to DXGI buffer.
Using IMF2DBuffer2_Lock2DSize with LockFlags_Write dramatically improves performance over IMFMediaBuffer_Lock when using a DXGI buffer. IMFMediaBuffer_Lock does not know that this buffer will not be read and therefore performs an unnecessary transfer of the texture from GPU to CPU before it is overwritten. -- v2: winegstreamer: Optimise copy to DXGI buffer. https://gitlab.winehq.org/wine/wine/-/merge_requests/5978
From: Brendan McGrath <bmcgrath(a)codeweavers.com> Using IMF2DBuffer2_Lock2DSize with LockFlags_Write dramatically improves performance over IMFMediaBuffer_Lock when using a DXGI buffer. IMFMediaBuffer_Lock does not know that this buffer will not be read and therefore performs an unnecessary transfer of the texture from GPU to CPU before it is overwritten. --- dlls/winegstreamer/gst_private.h | 2 +- dlls/winegstreamer/media_sink.c | 2 +- dlls/winegstreamer/wg_sample.c | 31 +++++++++++++++++++++++++------ 3 files changed, 27 insertions(+), 8 deletions(-) diff --git a/dlls/winegstreamer/gst_private.h b/dlls/winegstreamer/gst_private.h index 0f7d945ba37..0fa1aa0dac5 100644 --- a/dlls/winegstreamer/gst_private.h +++ b/dlls/winegstreamer/gst_private.h @@ -156,7 +156,7 @@ extern HRESULT mfplat_DllRegisterServer(void); IMFMediaType *mf_media_type_from_wg_format(const struct wg_format *format); void mf_media_type_to_wg_format(IMFMediaType *type, struct wg_format *format); -HRESULT wg_sample_create_mf(IMFSample *sample, struct wg_sample **out); +HRESULT wg_sample_create_mf(IMFSample *sample, struct wg_sample **out, MF2DBuffer_LockFlags lockFlags); HRESULT wg_sample_create_quartz(IMediaSample *sample, struct wg_sample **out); HRESULT wg_sample_create_dmo(IMediaBuffer *media_buffer, struct wg_sample **out); void wg_sample_release(struct wg_sample *wg_sample); diff --git a/dlls/winegstreamer/media_sink.c b/dlls/winegstreamer/media_sink.c index 5ee2c44dc70..9eb64756ec6 100644 --- a/dlls/winegstreamer/media_sink.c +++ b/dlls/winegstreamer/media_sink.c @@ -615,7 +615,7 @@ static HRESULT media_sink_process(struct media_sink *media_sink, IMFSample *samp if (FAILED(hr = media_sink_write_stream(media_sink))) WARN("Failed to write output samples to stream, hr %#lx.\n", hr); - if (FAILED(hr = wg_sample_create_mf(sample, &wg_sample))) + if (FAILED(hr = wg_sample_create_mf(sample, &wg_sample, MF2DBuffer_LockFlags_Read))) return hr; if (SUCCEEDED(IMFSample_GetSampleTime(sample, &time))) diff --git a/dlls/winegstreamer/wg_sample.c b/dlls/winegstreamer/wg_sample.c index 116dbb1f3ec..03dc7c9de5b 100644 --- a/dlls/winegstreamer/wg_sample.c +++ b/dlls/winegstreamer/wg_sample.c @@ -52,6 +52,7 @@ struct sample { IMFSample *sample; IMFMediaBuffer *buffer; + IMF2DBuffer2 *buffer2d2; } mf; struct { @@ -79,7 +80,15 @@ static void mf_sample_destroy(struct wg_sample *wg_sample) TRACE_(mfplat)("wg_sample %p.\n", wg_sample); - IMFMediaBuffer_Unlock(sample->u.mf.buffer); + if (sample->u.mf.buffer2d2) + { + IMF2DBuffer2_Unlock2D(sample->u.mf.buffer2d2); + IMF2DBuffer2_Release(sample->u.mf.buffer2d2); + } + else + { + IMFMediaBuffer_Unlock(sample->u.mf.buffer); + } IMFMediaBuffer_Release(sample->u.mf.buffer); IMFSample_Release(sample->u.mf.sample); } @@ -89,18 +98,28 @@ static const struct wg_sample_ops mf_sample_ops = mf_sample_destroy, }; -HRESULT wg_sample_create_mf(IMFSample *mf_sample, struct wg_sample **out) +HRESULT wg_sample_create_mf(IMFSample *mf_sample, struct wg_sample **out, MF2DBuffer_LockFlags lockFlags) { DWORD current_length, max_length; + LONG pitch; struct sample *sample; - BYTE *buffer; + BYTE *buffer, *scanline; HRESULT hr; if (!(sample = calloc(1, sizeof(*sample)))) return E_OUTOFMEMORY; if (FAILED(hr = IMFSample_ConvertToContiguousBuffer(mf_sample, &sample->u.mf.buffer))) goto fail; - if (FAILED(hr = IMFMediaBuffer_Lock(sample->u.mf.buffer, &buffer, &max_length, ¤t_length))) + if (SUCCEEDED(hr = IMFMediaBuffer_QueryInterface(sample->u.mf.buffer, &IID_IMF2DBuffer2, (void**)&sample->u.mf.buffer2d2)) && + FAILED(hr = IMF2DBuffer2_Lock2DSize(sample->u.mf.buffer2d2, lockFlags, &scanline, &pitch, &buffer, &max_length))) + { + IMF2DBuffer2_Release(sample->u.mf.buffer2d2); + sample->u.mf.buffer2d2 = NULL; + } + + if (SUCCEEDED(hr)) + current_length = max_length; + else if (FAILED(hr = IMFMediaBuffer_Lock(sample->u.mf.buffer, &buffer, &max_length, ¤t_length))) goto fail; IMFSample_AddRef((sample->u.mf.sample = mf_sample)); @@ -313,7 +332,7 @@ HRESULT wg_transform_push_mf(wg_transform_t transform, IMFSample *sample, TRACE_(mfplat)("transform %#I64x, sample %p, queue %p.\n", transform, sample, queue); - if (FAILED(hr = wg_sample_create_mf(sample, &wg_sample))) + if (FAILED(hr = wg_sample_create_mf(sample, &wg_sample, MF2DBuffer_LockFlags_Read))) return hr; if (SUCCEEDED(IMFSample_GetSampleTime(sample, &time))) @@ -347,7 +366,7 @@ HRESULT wg_transform_read_mf(wg_transform_t transform, IMFSample *sample, TRACE_(mfplat)("transform %#I64x, sample %p, flags %p.\n", transform, sample, flags); - if (FAILED(hr = wg_sample_create_mf(sample, &wg_sample))) + if (FAILED(hr = wg_sample_create_mf(sample, &wg_sample, MF2DBuffer_LockFlags_Write))) return hr; wg_sample->size = 0; -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/5978
On Thu Jul 4 22:56:45 2024 +0000, Brendan McGrath wrote:
changed this line in [version 2 of the diff](/wine/wine/-/merge_requests/5978/diffs?diff_id=120655&start_sha=128dc290b03fe1bb5e307780f826b46f2dcdac0b#03c4efe3c0041a988fac0ea9eb3a9e8e49f116f3_114_114) It turns out the issue wasn't related to the padding; I had simply introduced a bug. Namely I hadn't catered for the use-case where we read from the DXGI buffer rather than write. I've fixed that now, and I now seem to get the same results with and without this MR.
I only hit this issue once I started using an isolated video processor transform and passed it DXGI buffers. Prior to that, I was testing with `IMFSourceReader`, which was passing the transform buffers from wg_parser, which appear not to be DXGI buffers. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/5978#note_75144
On Thu Jul 4 23:01:39 2024 +0000, Brendan McGrath wrote:
It turns out the issue wasn't related to the padding; I had simply introduced a bug. Namely I hadn't catered for the use-case where we read from the DXGI buffer rather than write. I've fixed that now, and I now seem to get the same results with and without this MR. I only hit this issue once I started using an isolated video processor transform and passed it DXGI buffers. Prior to that, I was testing with `IMFSourceReader`, which was passing the transform buffers from wg_parser, which appear not to be DXGI buffers. Yeah, D3D buffers with software video conversion will be extremely inefficient. The proper solution here is to use the ID3D11VideoProcessor which I intend to use in Proton to do everything on the GPU. It's not yet implemented in WineD3D but I believe Yuxuan is looking into it. DXVK implements it already.
-- https://gitlab.winehq.org/wine/wine/-/merge_requests/5978#note_75164
D3D buffers with software video conversion will be extremely inefficient
That is true. But I'm glad I tried, as it highlighted the bug I'd introduced.
The proper solution here is to use the ID3D11VideoProcessor
I saw you mention that in an email, so I played with that as well. But I found it was still bottlenecked by the inefficient GPU copies. With the two GPU copies optimised (via this MR and MR!5979), I can play 4K video at 60 fps. Adding `ID3D11VideoProcessor` to that halved CPU usage. That was with a 4 second video, so it could be more if the video is longer. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/5978#note_75384
Marking this as draft for now, as I'm currently looking in to test cases that pass on Windows and fail on Wine. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/5978#note_76369
This merge request was closed by Brendan McGrath. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/5978
Closing this, as preliminary testing shows it doesn't improve the playback performance of the 4K logo in Pixelia. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/5978#note_80497
participants (3)
-
Brendan McGrath -
Brendan McGrath (@redmcg) -
Rémi Bernon