This fixes a latency issue with audio, particular when played with 4k video.
With a single queue, only one sample request can be processed at a time. So, for example, if a video sample takes 40ms to be delivered, then all pending audio samples will be delayed 40ms. This can lead to the audio PTS lagging the presentation clock and being dropped.
From: Brendan McGrath bmcgrath@codeweavers.com
This fixes a latency issue with audio, particular when played with 4k video.
With a single queue, only one sample request can be processed at a time. So, for example, if a video sample takes 40ms to be delivered, then all pending audio samples will be delayed 40ms. This can lead to the audio PTS lagging the presentation clock and being dropped. --- dlls/winegstreamer/media_source.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/dlls/winegstreamer/media_source.c b/dlls/winegstreamer/media_source.c index 3b5469932f3..595e26d8992 100644 --- a/dlls/winegstreamer/media_source.c +++ b/dlls/winegstreamer/media_source.c @@ -137,6 +137,8 @@ struct media_stream DWORD stream_id; BOOL active; BOOL eos; + + DWORD async_stream_queue; };
enum source_async_op @@ -548,7 +550,7 @@ static void flush_token_queue(struct media_stream *stream, BOOL send) command->u.request_sample.stream = stream; command->u.request_sample.token = stream->token_queue[i];
- hr = MFPutWorkItem(source->async_commands_queue, &source->async_commands_callback, op); + hr = MFPutWorkItem(stream->async_stream_queue, &source->async_commands_callback, op); IUnknown_Release(op); } if (FAILED(hr)) @@ -961,6 +963,7 @@ static ULONG WINAPI media_stream_Release(IMFMediaStream *iface) IMFStreamDescriptor_Release(stream->descriptor); IMFMediaEventQueue_Release(stream->event_queue); flush_token_queue(stream, FALSE); + MFUnlockWorkQueue(stream->async_stream_queue); free(stream); }
@@ -1075,7 +1078,7 @@ static HRESULT WINAPI media_stream_RequestSample(IMFMediaStream *iface, IUnknown IUnknown_AddRef(token); command->u.request_sample.token = token;
- hr = MFPutWorkItem(source->async_commands_queue, &source->async_commands_callback, op); + hr = MFPutWorkItem(stream->async_stream_queue, &source->async_commands_callback, op); IUnknown_Release(op); }
@@ -1682,6 +1685,13 @@ static HRESULT media_source_create(struct object_context *context, IMFMediaSourc goto fail; }
+ if (FAILED(hr = MFAllocateWorkQueue(&stream->async_stream_queue))) + { + IMFStreamDescriptor_Release(descriptor); + IMFMediaStream_Release(&stream->IMFMediaStream_iface); + goto fail; + } + object->duration = max(object->duration, wg_parser_stream_get_duration(wg_stream)); IMFStreamDescriptor_AddRef(descriptor); object->descriptors[i] = descriptor; @@ -1704,6 +1714,7 @@ fail: struct media_stream *stream = object->streams[object->stream_count]; IMFStreamDescriptor_Release(object->descriptors[object->stream_count]); IMFMediaStream_Release(&stream->IMFMediaStream_iface); + MFUnlockWorkQueue(stream->async_stream_queue); } free(object->descriptors); free(object->streams);
Hi,
It looks like your patch introduced the new failures shown below. Please investigate and fix them before resubmitting your patch. If they are not new, fixing them anyway would help a lot. Otherwise please ask for the known failures list to be updated.
The tests also ran into some preexisting test failures. If you know how to fix them that would be helpful. See the TestBot job for the details:
The full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=144565
Your paranoid android.
=== debian11b (64 bit WoW report) ===
user32: msg.c:12587: Test failed: Got expected 0. msg.c:12602: Test failed: Got expected 0.
Not saying that we shouldn't make this change, but were you able to see similar behavior on Windows? Or is that another side effect of delivering uncompressed samples? For 4K video example, was that using media session or other API?
On Tue Apr 2 20:53:17 2024 +0000, Nikolay Sivov wrote:
Not saying that we shouldn't make this change, but were you able to see similar behavior on Windows? Or is that another side effect of delivering uncompressed samples? For 4K video example, was that using media session or other API?
I hit this when testing MS Flight Simulator. It it using the Media Engine in rendering mode (which in turn uses media session).
But I can't say I've seen similar on Windows; though I haven't really looked. I'm not sure of a great way to test it tbh. If you do have any ideas, I'll give it a go.
On Tue Apr 2 20:53:17 2024 +0000, Brendan McGrath wrote:
I hit this when testing MS Flight Simulator. It it using the Media Engine in rendering mode (which in turn uses media session). But I can't say I've seen similar on Windows; though I haven't really looked. I'm not sure of a great way to test it tbh. If you do have any ideas, I'll give it a go.
What I suspect happens is that winegstreamer is using a single thread pool for its queue. You can check if replacing it with e.g. MFAllocateWorkQueueEx(MF_MULTITHREADED_WORKQUEUE) gives the same result. I see now that it's internal detail rather than something visible or testable, because one of the functions is to block at wait_on_sample().
So yes, maybe having a thread per stream is fine. On Windows the source is likely much lighter and returns samples quicker.
On Tue Apr 2 22:21:52 2024 +0000, Nikolay Sivov wrote:
What I suspect happens is that winegstreamer is using a single thread pool for its queue. You can check if replacing it with e.g. MFAllocateWorkQueueEx(MF_MULTITHREADED_WORKQUEUE) gives the same result. I see now that it's internal detail rather than something visible or testable, because one of the functions is to block at wait_on_sample(). So yes, maybe having a thread per stream is fine. On Windows the source is likely much lighter and returns samples quicker.
Yes, and I don't think we should make it more concurrent than it already is, we already have too many problems with that. Instead, it should be as light as it is on Windows and not decode samples.
What I suspect happens is that winegstreamer is using a single thread pool for its queue. You can check if replacing it with e.g. MFAllocateWorkQueueEx(MF_MULTITHREADED_WORKQUEUE) gives the same result.
That's right, it sets up the queue with `MFAllocateWorkQueue` which calls `RtwqAllocateWorkQueue` with `RTWQ_STANDARD_WORKQUEUE` (which is min and max of 1 thread).
I just tried modifying it to use `MF_MULTITHREADED_WORKQUEUE` and got following assertion failure: ``` ../src-wine/dlls/winegstreamer/wg_parser.c:412: wg_parser_stream_release_buffer: Assertion `stream->buffer' failed. ```
I suspect we don't do well with two threads in one stream (although we could probably fix that with a per stream CS/mutex). But otherwise I suspect it would produce the same result.
it should be as light as it is on Windows and not decode samples
Is this in reference to an ongoing design discussion? If so, I'm not sure what the path forward is for this issue/MR?
On Wed Apr 3 01:38:19 2024 +0000, Brendan McGrath wrote:
it should be as light as it is on Windows and not decode samples
Is this in reference to an ongoing design discussion? If so, I'm not sure what the path forward is for this issue/MR?
Yes, this is a big difference in the way the media source is implemented, and it causes all sorts of issues in other places. I've been slowly working toward changing it but it requires a lot of changes in the MF pipeline to support it.
Many applications implicitly or explicitly depend on MF pipelines exposing similar components as on Windows (source -> decoder -> converter -> sink), and we should implement the same kind of thing for compatibility. If the decoders are taking a long time to decode and stall each-other, it should be solved by making them as concurrent as they are on Windows.
There's also some diverging opinions on how the media source itself should be implemented, where I would like to use a separate unix interface to make it simpler and synchronous, while @zfigura doesn't want that.
On Wed Apr 3 12:15:37 2024 +0000, Rémi Bernon wrote:
Yes, this is a big difference in the way the media source is implemented, and it causes all sorts of issues in other places. I've been slowly working toward changing it but it requires a lot of changes in the MF pipeline to support it. Many applications implicitly or explicitly depend on MF pipelines exposing similar components as on Windows (source -> decoder -> converter -> sink), and we should implement the same kind of thing for compatibility. If the decoders are taking a long time to decode and stall each-other, it should be solved by making them as concurrent as they are on Windows. There's also some diverging opinions on how the media source itself should be implemented, where I would like to use a separate unix interface to make it simpler and synchronous, while @zfigura doesn't want that.
To add a bit more detail about the issue, the two main causes for the 40ms wait appear to be: 1. The decoding that takes place in the Theora decoder; and 2. The memcpy that takes place in wg_parser_stream_copy_buffer
I've tried a hack to remove the memcpy, but it's unfortunately not enough; and the decoding is obviously unavoidable. So the only fix appears to be to have the video and audio happen in parallel (not sequentially).
Obviously having the video decode happen in a separate MFT (in its own thread) would do the job, but do you think there's a place for this MR in the interim? Or do you think the additional concurrency it introduces is too great a risk?
On Wed Apr 3 23:41:35 2024 +0000, Brendan McGrath wrote:
To add a bit more detail about the issue, the two main causes for the 40ms wait appear to be:
- The decoding that takes place in the Theora decoder; and
- The memcpy that takes place in wg_parser_stream_copy_buffer
I've tried a hack to remove the memcpy, but it's unfortunately not enough; and the decoding is obviously unavoidable. So the only fix appears to be to have the video and audio happen in parallel (not sequentially). Obviously having the video decode happen in a separate MFT (in its own thread) would do the job, but do you think there's a place for this MR in the interim? Or do you think the additional concurrency it introduces is too great a risk?
We're having a lot of problems already with concurrency in the media source, with a lot of people hitting deadlocks, and more generally because it doesn't behave like on Windows, so IMO this MR isn't the right fix.
On Thu Apr 4 10:47:51 2024 +0000, Rémi Bernon wrote:
We're having a lot of problems already with concurrency in the media source, with a lot of people hitting deadlocks, and more generally because it doesn't behave like on Windows, so IMO this MR isn't the right fix.
OK, I'll close this MR. I do have an alternate fix that prioritises the audio decoding in the shared queue; maybe that in combination with removing the memcpy will be enough.
But I think if we're to process audio and video sequentially, we need to at least make the processing less expensive (i.e. demux rather than decode). Is there anything in place that already does this? Does the new_media_source in Proton help in this regard?
Does the new_media_source in Proton help in this regard?
Yes, it's one of its purpose.
Fwiw, regarding buffer copies, the MF transforms are already optimized to avoid as much copies as possible, and should already be able to decode directly into the MF buffers. So, as soon as the media source stops decoding, it should benefit from this optimization right away.
On Thu Apr 4 18:38:15 2024 +0000, Rémi Bernon wrote:
Does the new_media_source in Proton help in this regard?
Yes, it's one of its purpose. Fwiw, regarding buffer copies, the MF transforms are already optimized to avoid as much copies as possible, and should already be able to decode directly into the MF buffers. So, as soon as the media source stops decoding, it should benefit from this optimization right away.
OK, thanks @rbernon; I'll take a look at the new_media_source as well then.
This merge request was closed by Brendan McGrath.