Currently, the winegstreamer media source checks for EOS when RequestSample() is called, but doesn't handle the cases when EOS is detected between the RequestSample() call and the moment when the request is popped from the command queue and serviced. This can result in the media source waiting forever for a sample and get stuck.
This commit fixes the bug by adding a check for EOS in wg_parser_stream_get_event().
This commit fixes Medieval Dynasty hanging on developer logos on the Steam Deck.
Signed-off-by: Giovanni Mascellani gmascellani@codeweavers.com --- dlls/winegstreamer/media_source.c | 13 ++++++++++--- dlls/winegstreamer/wg_parser.c | 15 ++++++++++++--- 2 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/dlls/winegstreamer/media_source.c b/dlls/winegstreamer/media_source.c index 85ec31d2498..5393835b696 100644 --- a/dlls/winegstreamer/media_source.c +++ b/dlls/winegstreamer/media_source.c @@ -548,9 +548,16 @@ static void wait_on_sample(struct media_stream *stream, IUnknown *token) return;
case WG_PARSER_EVENT_EOS: - stream->eos = TRUE; - IMFMediaEventQueue_QueueEventParamVar(stream->event_queue, MEEndOfStream, &GUID_NULL, S_OK, &empty_var); - dispatch_end_of_presentation(stream->parent_source); + if (stream->eos) + { + IMFMediaEventQueue_QueueEventParamVar(stream->event_queue, MEError, &GUID_NULL, MF_E_END_OF_STREAM, &empty_var); + } + else + { + stream->eos = TRUE; + IMFMediaEventQueue_QueueEventParamVar(stream->event_queue, MEEndOfStream, &GUID_NULL, S_OK, &empty_var); + dispatch_end_of_presentation(stream->parent_source); + } return;
case WG_PARSER_EVENT_SEGMENT: diff --git a/dlls/winegstreamer/wg_parser.c b/dlls/winegstreamer/wg_parser.c index 013566b25e9..beb4d92bf5b 100644 --- a/dlls/winegstreamer/wg_parser.c +++ b/dlls/winegstreamer/wg_parser.c @@ -106,7 +106,7 @@ struct wg_parser_stream GstBuffer *buffer; GstMapInfo map_info;
- bool flushing, eos, enabled, has_caps; + bool flushing, eos, processed_eos, enabled, has_caps;
uint64_t duration; }; @@ -681,7 +681,7 @@ static NTSTATUS wg_parser_stream_get_event(void *args)
pthread_mutex_lock(&parser->mutex);
- while (!parser->flushing && stream->event.type == WG_PARSER_EVENT_NONE) + while (!parser->flushing && !stream->processed_eos && stream->event.type == WG_PARSER_EVENT_NONE) pthread_cond_wait(&stream->event_cond, &parser->mutex);
if (parser->flushing) @@ -691,7 +691,16 @@ static NTSTATUS wg_parser_stream_get_event(void *args) return VFW_E_WRONG_STATE; }
- *params->event = stream->event; + if (stream->processed_eos) + params->event->type = WG_PARSER_EVENT_EOS; + else + *params->event = stream->event; + + if (stream->event.type == WG_PARSER_EVENT_EOS && !stream->processed_eos) + { + stream->processed_eos = true; + pthread_cond_signal(&stream->event_cond); + }
if (stream->event.type != WG_PARSER_EVENT_BUFFER) {
On 2/8/22 11:11, Giovanni Mascellani wrote:
Currently, the winegstreamer media source checks for EOS when RequestSample() is called, but doesn't handle the cases when EOS is detected between the RequestSample() call and the moment when the request is popped from the command queue and serviced. This can result in the media source waiting forever for a sample and get stuck.
I'm having a hard time understanding the race here. Can wait_on_sample() be called multiple times concurrently for the same stream? If not I don't see how it can race with itself.
Hi,
Il 08/02/22 18:50, Zebediah Figura ha scritto:
I'm having a hard time understanding the race here. Can wait_on_sample() be called multiple times concurrently for the same stream? If not I don't see how it can race with itself.
Consider this flow: * RequestSample() is called: it checks that the stream is not EOS and queues SOURCE_ASYNC_REQUEST_SAMPLE; * while SOURCE_ASYNC_REQUEST_SAMPLE is in the queue, RequestSample() is called again, and another SOURCE_ASYNC_REQUEST_SAMPLE is queued; * the first SOURCE_ASYNC_REQUEST_SAMPLE is executed and wait_on_sample() is called; wg_parser_stream_get_event() happens to find stream->event of type WG_PARSER_EVENT_EOS; it returns that event and reset the event to WG_PARSER_EVENT_NONE; consequently wait_on_sample() issues a MEEndOfStream event; * the second SOURCE_ASYNC_REQUEST_SAMPLE is executed and wait_on_sample() is called; wg_parser_stream_get_event() sees the stream event set to WG_PARSER_EVENT_NONE, and it will never change, therefore it waits forever.
The problem is that the EOS condition must be rechecked inside wh_parser_stream_get_event(), because last time it was checked (in RequestSample()) might be out-of-date.
Does this make sense?
Thanks, Giovanni.
On 2/8/22 13:44, Giovanni Mascellani wrote:
Hi,
Il 08/02/22 18:50, Zebediah Figura ha scritto:
I'm having a hard time understanding the race here. Can wait_on_sample() be called multiple times concurrently for the same stream? If not I don't see how it can race with itself.
Consider this flow: * RequestSample() is called: it checks that the stream is not EOS and queues SOURCE_ASYNC_REQUEST_SAMPLE; * while SOURCE_ASYNC_REQUEST_SAMPLE is in the queue, RequestSample() is called again, and another SOURCE_ASYNC_REQUEST_SAMPLE is queued; * the first SOURCE_ASYNC_REQUEST_SAMPLE is executed and wait_on_sample() is called; wg_parser_stream_get_event() happens to find stream->event of type WG_PARSER_EVENT_EOS; it returns that event and reset the event to WG_PARSER_EVENT_NONE; consequently wait_on_sample() issues a MEEndOfStream event; * the second SOURCE_ASYNC_REQUEST_SAMPLE is executed and wait_on_sample() is called; wg_parser_stream_get_event() sees the stream event set to WG_PARSER_EVENT_NONE, and it will never change, therefore it waits forever.
The problem is that the EOS condition must be rechecked inside wh_parser_stream_get_event(), because last time it was checked (in RequestSample()) might be out-of-date.
Does this make sense?
Yes, thanks for clarifying.
In that case it seems sufficient to check stream->eos in wait_on_sample(). There should be no need to modify the Unix side here.
Somewhat out of curiosity, have you actually observed both MF_E_END_OF_STREAM from RequestSample() and MEError with MF_E_END_OF_STREAM from the same media source on Windows? It seems awkward that we're reporting the same condition in two different ways like that, but if that's the way Windows behaves...
Hi,
Il 08/02/22 22:13, Zebediah Figura ha scritto:
In that case it seems sufficient to check stream->eos in wait_on_sample(). There should be no need to modify the Unix side here.
Ok, I guess I can try that approach.
Somewhat out of curiosity, have you actually observed both MF_E_END_OF_STREAM from RequestSample() and MEError with MF_E_END_OF_STREAM from the same media source on Windows? It seems awkward that we're reporting the same condition in two different ways like that, but if that's the way Windows behaves...
I wouldn't say it doesn't feel awkward, but not more awkward than the rest of MF event architecture. On this specific point, I think that the philosophy is pretty transparent, and even decently explained on MSDN: when RequestSample() is able to determine at the time of the call that the stream is already EOS, it can return the error directly. If it only later sees that condition, it has no way other than emitting MEError, because the RequestSample() call has already returned.
(Now, one might slyly ask what happens if the EOS condition would have been cleared between the RequestSample() call and the moment when that call would be serviced, for example because another Start() call is in the queue; AFAICT, MSDN answers in the most Microsoft way possible: it doesn't at all, and I have not researched that specific point)
At the same time, I admit I did no specific testing, but actually seeing the MEError being sent on Windows might be difficult, because you have to somehow cause the EOS to be detected between when RequestSample() has returned and when it is serviced. I don't know the implementation details of Microsoft's media source, so it might take a lot of time before I find the right conditions, and maybe it's not really a good idea to spend time on that thing.
Giovanni.
On 2/9/22 01:54, Giovanni Mascellani wrote:
Hi,
Il 08/02/22 22:13, Zebediah Figura ha scritto:
In that case it seems sufficient to check stream->eos in wait_on_sample(). There should be no need to modify the Unix side here.
Ok, I guess I can try that approach.
Somewhat out of curiosity, have you actually observed both MF_E_END_OF_STREAM from RequestSample() and MEError with MF_E_END_OF_STREAM from the same media source on Windows? It seems awkward that we're reporting the same condition in two different ways like that, but if that's the way Windows behaves...
I wouldn't say it doesn't feel awkward, but not more awkward than the rest of MF event architecture. On this specific point, I think that the philosophy is pretty transparent, and even decently explained on MSDN: when RequestSample() is able to determine at the time of the call that the stream is already EOS, it can return the error directly. If it only later sees that condition, it has no way other than emitting MEError, because the RequestSample() call has already returned.
Okay, makes sense.
(Now, one might slyly ask what happens if the EOS condition would have been cleared between the RequestSample() call and the moment when that call would be serviced, for example because another Start() call is in the queue; AFAICT, MSDN answers in the most Microsoft way possible: it doesn't at all, and I have not researched that specific point)
At the same time, I admit I did no specific testing, but actually seeing the MEError being sent on Windows might be difficult, because you have to somehow cause the EOS to be detected between when RequestSample() has returned and when it is serviced. I don't know the implementation details of Microsoft's media source, so it might take a lot of time before I find the right conditions, and maybe it's not really a good idea to spend time on that thing.
Yeah, I wasn't suggesting to actually research this, I don't think it's really worth looking into. If the documentation says this is how it behaves that makes sense.
On 2/8/22 22:44, Giovanni Mascellani wrote:
Hi,
Il 08/02/22 18:50, Zebediah Figura ha scritto:
I'm having a hard time understanding the race here. Can wait_on_sample() be called multiple times concurrently for the same stream? If not I don't see how it can race with itself.
Consider this flow: * RequestSample() is called: it checks that the stream is not EOS and queues SOURCE_ASYNC_REQUEST_SAMPLE; * while SOURCE_ASYNC_REQUEST_SAMPLE is in the queue, RequestSample() is called again, and another SOURCE_ASYNC_REQUEST_SAMPLE is queued; * the first SOURCE_ASYNC_REQUEST_SAMPLE is executed and wait_on_sample() is called; wg_parser_stream_get_event() happens to find stream->event of type WG_PARSER_EVENT_EOS; it returns that event and reset the event to WG_PARSER_EVENT_NONE; consequently wait_on_sample() issues a MEEndOfStream event; * the second SOURCE_ASYNC_REQUEST_SAMPLE is executed and wait_on_sample() is called; wg_parser_stream_get_event() sees the stream event set to WG_PARSER_EVENT_NONE, and it will never change, therefore it waits forever.
The problem is that the EOS condition must be rechecked inside wh_parser_stream_get_event(), because last time it was checked (in RequestSample()) might be out-of-date.
Does this make sense?
I think this will benefit from some locking, protecting source state.
Thanks, Giovanni.
Hi,
Il 09/02/22 07:27, Nikolay Sivov ha scritto:
I think this will benefit from some locking, protecting source state.
My understanding is that access to the media source state is (mostly) already serialized by the fact that the commands queue only allows one job at a time. It doesn't mean that there are no bug (as my patch exposes), but first you should identify the issues that need to be addressed, I think.
Giovanni.
On 2/9/22 10:56, Giovanni Mascellani wrote:
Hi,
Il 09/02/22 07:27, Nikolay Sivov ha scritto:
I think this will benefit from some locking, protecting source state.
My understanding is that access to the media source state is (mostly) already serialized by the fact that the commands queue only allows one job at a time. It doesn't mean that there are no bug (as my patch exposes), but first you should identify the issues that need to be addressed, I think.
I mean access to state fields, like "state" or "eos". That's easy to see for shutdown state checks. Async commands are serialized, in a sense, but it's questionable functionality too. For example queuing 3 sample requests, and then stopping will have to process these 3 requests first, and then stop command. While in fact it might as well drop pending requests and transition right when Stop() is called.
Giovanni.