If both buffers are DXGI buffers, they can be copied on the GPU. They are currently transferred to the CPU, copied by the CPU and then transferred back to the GPU.
Performing a GPU copy produces ~25% faster playback on 4K video.
From: Brendan McGrath bmcgrath@codeweavers.com
If both buffers are DXGI buffers, they can be copied on the GPU. They are currently transferred to the CPU, copied by the CPU and then transferred back to the GPU.
Performing a GPU copy produces ~25% faster playback on 4K video. --- dlls/mfreadwrite/reader.c | 84 +++++++++++++++++++++++++++++++++++---- 1 file changed, 76 insertions(+), 8 deletions(-)
diff --git a/dlls/mfreadwrite/reader.c b/dlls/mfreadwrite/reader.c index 0e0ba10b076..17ead32a50a 100644 --- a/dlls/mfreadwrite/reader.c +++ b/dlls/mfreadwrite/reader.c @@ -38,6 +38,9 @@ #include "wine/list.h"
#include "mf_private.h" +#undef EXTERN_GUID +#define EXTERN_GUID DEFINE_GUID +#include "d3d11.h"
WINE_DEFAULT_DEBUG_CHANNEL(mfplat);
@@ -429,9 +432,70 @@ static void source_reader_response_ready(struct source_reader *reader, struct st stream->requests--; }
-static void source_reader_copy_sample_buffer(IMFSample *src, IMFSample *dst) +static HRESULT dxgi_copy(IMFMediaBuffer *src, IMFMediaBuffer *dst, IUnknown *device_manager) { - IMFMediaBuffer *buffer; + HRESULT hr; + IMFDXGIBuffer *dxgi_src, *dxgi_dst; + IMFDXGIDeviceManager *dxgi_manager; + HANDLE device_handle; + ID3D11Device *device; + ID3D11DeviceContext *device_context; + ID3D11Texture2D *texture_src, *texture_dst; + + if (!device_manager) + return E_INVALIDARG; + + if (FAILED(hr = IUnknown_QueryInterface(device_manager, &IID_IMFDXGIDeviceManager, (void**)&dxgi_manager))) + return hr; + + if (FAILED(hr = IMFMediaBuffer_QueryInterface(src, &IID_IMFDXGIBuffer, (void**)&dxgi_src))) + goto release_dxgi; + + if (FAILED(hr = IMFMediaBuffer_QueryInterface(dst, &IID_IMFDXGIBuffer, (void**)&dxgi_dst))) + goto release_src; + + if (FAILED(hr = IMFDXGIBuffer_GetResource(dxgi_src, &IID_ID3D11Texture2D, (void**)&texture_src))) + goto release_dst; + + if (FAILED(hr = IMFDXGIBuffer_GetResource(dxgi_dst, &IID_ID3D11Texture2D, (void**)&texture_dst))) + goto release_texture_src; + + if (FAILED(hr = IMFDXGIDeviceManager_OpenDeviceHandle(dxgi_manager, &device_handle))) + goto release_texture_dest; + + if (FAILED(IMFDXGIDeviceManager_LockDevice(dxgi_manager, device_handle, &IID_ID3D11Device, (void **)&device, TRUE))) + goto close_device_handle; + + ID3D11Device_GetImmediateContext(device, &device_context); + ID3D11DeviceContext_CopyResource(device_context, (ID3D11Resource *)texture_dst, (ID3D11Resource *)texture_src); + ID3D11DeviceContext_Release(device_context); + ID3D11Device_Release(device); + IMFDXGIDeviceManager_UnlockDevice(dxgi_manager, device_handle, FALSE); + +close_device_handle: + IMFDXGIDeviceManager_CloseDeviceHandle(dxgi_manager, device_handle); + +release_texture_dest: + ID3D11Texture2D_Release(texture_dst); + +release_texture_src: + ID3D11Texture2D_Release(texture_src); + +release_dst: + IMFDXGIBuffer_Release(dxgi_dst); + +release_src: + IMFDXGIBuffer_Release(dxgi_src); + +release_dxgi: + IMFDXGIDeviceManager_Release(dxgi_manager); + + return hr; +} + +static void source_reader_copy_sample_buffer(IMFSample *src, IMFSample *dst, IUnknown *device_manager) +{ + IMFMediaBuffer *buffer_dst, *buffer_src; LONGLONG time; DWORD flags; HRESULT hr; @@ -451,14 +515,18 @@ static void source_reader_copy_sample_buffer(IMFSample *src, IMFSample *dst) if (SUCCEEDED(IMFSample_GetSampleFlags(src, &flags))) IMFSample_SetSampleFlags(dst, flags);
- if (SUCCEEDED(IMFSample_ConvertToContiguousBuffer(src, NULL))) + if (SUCCEEDED(IMFSample_ConvertToContiguousBuffer(src, &buffer_src))) { - if (SUCCEEDED(IMFSample_GetBufferByIndex(dst, 0, &buffer))) + if (SUCCEEDED(IMFSample_GetBufferByIndex(dst, 0, &buffer_dst))) { - if (FAILED(hr = IMFSample_CopyToBuffer(src, buffer))) - WARN("Failed to copy a buffer, hr %#lx.\n", hr); - IMFMediaBuffer_Release(buffer); + if(FAILED(dxgi_copy(buffer_src, buffer_dst, device_manager))) + { + if (FAILED(hr = IMFSample_CopyToBuffer(src, buffer_dst))) + WARN("Failed to copy a buffer, hr %#lx.\n", hr); + } + IMFMediaBuffer_Release(buffer_dst); } + IMFMediaBuffer_Release(buffer_src); } }
@@ -1204,7 +1272,7 @@ static struct stream_response *media_stream_pop_response(struct source_reader *r /* Return allocation error to the caller, while keeping original response sample in for later. */ if (SUCCEEDED(hr = IMFVideoSampleAllocatorEx_AllocateSample(stream->allocator, &sample))) { - source_reader_copy_sample_buffer(response->sample, sample); + source_reader_copy_sample_buffer(response->sample, sample, reader->device_manager); IMFSample_Release(response->sample); response->sample = sample; }
Hi,
It looks like your patch introduced the new failures shown below. Please investigate and fix them before resubmitting your patch. If they are not new, fixing them anyway would help a lot. Otherwise please ask for the known failures list to be updated.
The tests also ran into some preexisting test failures. If you know how to fix them that would be helpful. See the TestBot job for the details:
The full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=146774
Your paranoid android.
=== debian11b (64 bit WoW report) ===
kernel32: comm.c:1574: Test failed: AbortWaitCts hComPortEvent failed comm.c:1586: Test failed: Unexpected time 1002, expected around 500
I don't think we should do any copy at all here.
When the source reader is D3D-aware, and when its pipeline contains either a decoder or a video processor (which is normally the case 99.9% of the time, and not the case only when playing raw video), they will allocate D3D samples already and we don't need to copy anything.
On Fri Jul 5 01:31:36 2024 +0000, Rémi Bernon wrote:
I don't think we should do any copy at all here. When the source reader is D3D-aware, and when its pipeline contains either a decoder or a video processor (which is normally the case 99.9% of the time, and not the case only when playing raw video), they will allocate D3D samples already and we don't need to copy anything.
I'm not super familiar with the `IMFSourceReader`, so I might have some of this wrong. But it looks like the current design receives samples from the `IMFMediaStream` asynchronously and then queues them.
The copy takes place when the application calls `IMFSourceReader::ReadSample` and the application is provided a sample. The sample received by the application is created by an internally managed `IMFVideoSampleAllocatorEx`, whilst the samples from `IMFMediaStream` are created by the media stream itself.
So I think under this design, a copy is unavoidable. Is it designed that way to match the Windows implementation? Or do we have scope for change?
Also of note, it also looks like `IMFMediaStream` is not currently D3D-aware. I think we would need to implement `IMFMediaSourceEx` so we can call `IMFMediaSourceEx::SetD3DManager`. I guess we will want to do this if we want to receive YUV output in a D3D texture for potential color conversion by the GPU.
Also of note, it also looks like `IMFMediaStream` is not currently D3D-aware. I think we would need to implement `IMFMediaSourceEx` so we can call `IMFMediaSourceEx::SetD3DManager`. I guess we will want to do this if we want to receive YUV output in a D3D texture for potential color conversion by the GPU.
Yes, possibly, but the media source is also not supposed to decode buffers and that should be done by the MF decoder + processor pipeline, and these components are now D3D-aware.
The sample received by the application is created by an internally managed `IMFVideoSampleAllocatorEx`, whilst the samples from `IMFMediaStream` are created by the media stream itself.
The source reader only uses `IMFVideoSampleAllocatorEx` when it has been initialized with a device manager, in which case it also passes it to its pipeline and initialize any D3D-aware component.
The media source shouldn't decode its buffers, and except in the case where you are playing a file with raw video in it (which pretty much never happens in any real situation), you would normally always have a pipeline after it.
When using a video decoder or processor, the D3D-aware video processor will allocate its output samples, and we don't need to copy these buffers at all in the source reader.
To cover the case with raw video we could either keep an optional copy here if there is no pipeline, or, like you described implement `IMFMediaSourceEx::SetD3DManager`.
On Fri Jul 5 06:26:44 2024 +0000, Rémi Bernon wrote:
Also of note, it also looks like `IMFMediaStream` is not currently
D3D-aware. I think we would need to implement `IMFMediaSourceEx` so we can call `IMFMediaSourceEx::SetD3DManager`. I guess we will want to do this if we want to receive YUV output in a D3D texture for potential color conversion by the GPU. Yes, possibly, but the media source is also not supposed to decode buffers and that should be done by the MF decoder + processor pipeline, and these components are now D3D-aware.
The sample received by the application is created by an internally
managed `IMFVideoSampleAllocatorEx`, whilst the samples from `IMFMediaStream` are created by the media stream itself. The source reader only uses `IMFVideoSampleAllocatorEx` when it has been initialized with a device manager, in which case it also passes it to its pipeline and initialize any D3D-aware component. The media source shouldn't decode its buffers, and except in the case where you are playing a file with raw video in it (which pretty much never happens in any real situation), you would normally always have a pipeline after it. When using a video decoder or processor, the D3D-aware video processor will allocate its output samples, and we don't need to copy these buffers at all in the source reader. To cover the case with raw video we could either keep an optional copy here if there is no pipeline, or, like you described implement `IMFMediaSourceEx::SetD3DManager`.
Do we know when IMFMediaSourceEx is available on Windows?
On Fri Jul 5 11:12:46 2024 +0000, Nikolay Sivov wrote:
Do we know when IMFMediaSourceEx is available on Windows?
@rbernon OK, I think I got it. So on Windows, `IMFMediaStream` will output demuxed, but not decoded video buffers. A decoder transform would then perform parsing and decoding before (most likely) passing YUV buffers to the Video Processor transform. This will then perform color conversion to output RGB DXGI buffers that are then queued and passed directly to the application via `IMFSourceReader::ReadSample`.
The YUV buffer passed to the Video Processor transform can also be a DXGI buffer if color conversion is done on the GPU.
On Wine, we currently output decoded YUV buffers (in I420) from `IMFMediaStream` and then use the Video Processor transform to perform color conversion to RGB. But because its output is a DXGI buffer (when D3D aware), we no longer need `IMFVideoSampleAllocatorEx`. But I'm trying work out why we have `IMFVideoSampleAllocatorEx`. Was it added prior to the use of the Video Processor transform?
@nsivov Going by the documentation, Windows 8: https://learn.microsoft.com/en-us/windows/win32/api/mfidl/nn-mfidl-imfmedias...
On Mon Jul 8 01:58:33 2024 +0000, Brendan McGrath wrote:
@rbernon OK, I think I got it. So on Windows, `IMFMediaStream` will output demuxed, but not decoded video buffers. A decoder transform would then perform parsing and decoding before (most likely) passing YUV buffers to the Video Processor transform. This will then perform color conversion to output RGB DXGI buffers that are then queued and passed directly to the application via `IMFSourceReader::ReadSample`. The YUV buffer passed to the Video Processor transform can also be a DXGI buffer if color conversion is done on the GPU. On Wine, we currently output decoded YUV buffers (in I420) from `IMFMediaStream` and then use the Video Processor transform to perform color conversion to RGB. But because its output is a DXGI buffer (when D3D aware), we no longer need `IMFVideoSampleAllocatorEx`. But I'm trying work out why we have `IMFVideoSampleAllocatorEx`. Was it added prior to the use of the Video Processor transform? @nsivov Going by the documentation, Windows 8: https://learn.microsoft.com/en-us/windows/win32/api/mfidl/nn-mfidl-imfmedias...
I meant what kind of sources expose it on Windows.
But I'm trying work out why we have `IMFVideoSampleAllocatorEx`. Was it added prior to the use of the Video Processor transform?
I think it was added at some point in an attempt to make the source reader able to output D3D buffers as some applications expect it, without implementing the D3D-aware pipelines like native does (and as you correctly described), and while keeping the shortcut we took in the media source in an attempt to keep relying on GStreamer pipelines and possible benefits that it could have (IMO not much at this point, and more problems than anything else).
On Mon Jul 8 12:50:45 2024 +0000, Rémi Bernon wrote:
But I'm trying work out why we have `IMFVideoSampleAllocatorEx`. Was
it added prior to the use of the Video Processor transform? I think it was added at some point in an attempt to make the source reader able to output D3D buffers as some applications expect it, without implementing the D3D-aware pipelines like native does (and as you correctly described), and while keeping the shortcut we took in the media source in an attempt to keep relying on GStreamer pipelines and possible benefits that it could have (IMO not much at this point, and more problems than anything else).
Note that some file types can contain uncompressed video or audio. It may be worth testing with such a file; cf. dlls/mfplat/tests/test-i420.avi.