[PATCH v2 0/1] MR1278: winegstreamer: Do not destroy condition variables when a thread is waiting on them.

List overview All Threads

newer

older

Re: [PATCH v2 0/1] MR1333: ntdll:...

[PATCH 0/2] MR1332: d3drm: Use CRT...

Giovanni Mascellani (＠giomasce)

7 Nov 2022 7 Nov '22

2:19 p.m.

-- v2: winegstreamer: Do not destroy condition variables when a thread is waiting on them.

https://gitlab.winehq.org/wine/wine/-/merge_requests/1278

Show replies by date

Giovanni Mascellani

7 Nov 7 Nov

2:19 p.m.

New subject: [PATCH v2 1/1] winegstreamer: Do not destroy condition variables when a thread is waiting on them.

From: Giovanni Mascellani gmascellani@codeweavers.com

It is possible that a stream is destroyed while another thread is waiting on event_empty_cond or event_cond. The waiting thread should have the opportunity to be rescheduled and exit the critical section before the condition variables and the mutex are destroyed. --- dlls/winegstreamer/wg_parser.c | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/dlls/winegstreamer/wg_parser.c b/dlls/winegstreamer/wg_parser.c index 0573c99858b..feead069391 100644 --- a/dlls/winegstreamer/wg_parser.c +++ b/dlls/winegstreamer/wg_parser.c @@ -102,13 +102,15 @@ struct wg_parser_stream GstSegment segment; struct wg_format preferred_format, current_format;

- pthread_cond_t event_cond, event_empty_cond; + pthread_cond_t event_cond, event_empty_cond, free_cond; GstBuffer *buffer; GstMapInfo map_info;

- bool flushing, eos, enabled, has_caps; + bool flushing, eos, enabled, has_caps, freeing;

uint64_t duration; + + unsigned int waiters; };

static NTSTATUS wg_parser_get_stream_count(void *args) @@ -270,8 +272,15 @@ static NTSTATUS wg_parser_stream_get_buffer(void *args)

pthread_mutex_lock(&parser->mutex);

- while (!stream->eos && !stream->buffer) + while (!stream->eos && !stream->buffer && !stream->freeing) + { + stream->waiters++; pthread_cond_wait(&stream->event_cond, &parser->mutex); + stream->waiters--; + } + + if (stream->freeing && stream->waiters == 0) + pthread_cond_signal(&stream->free_cond);

/* Note that we can both have a buffer and stream->eos, in which case we * must return the buffer. */ @@ -547,7 +556,14 @@ static GstFlowReturn sink_chain_cb(GstPad *pad, GstObject *parent, GstBuffer *bu * implementing a queue object here. */

while (stream->enabled && !stream->flushing && stream->buffer) + { + stream->waiters++; pthread_cond_wait(&stream->event_empty_cond, &parser->mutex); + stream->waiters--; + } + + if (stream->freeing && stream->waiters == 0) + pthread_cond_signal(&stream->free_cond);

if (!stream->enabled) { @@ -694,6 +710,7 @@ static struct wg_parser_stream *create_stream(struct wg_parser *parser) stream->current_format.major_type = WG_MAJOR_TYPE_UNKNOWN; pthread_cond_init(&stream->event_cond, NULL); pthread_cond_init(&stream->event_empty_cond, NULL); + pthread_cond_init(&stream->free_cond, NULL);

sprintf(pad_name, "qz_sink_%u", parser->stream_count); stream->my_sink = gst_pad_new(pad_name, GST_PAD_SINK); @@ -708,6 +725,13 @@ static struct wg_parser_stream *create_stream(struct wg_parser *parser)

static void free_stream(struct wg_parser_stream *stream) { + pthread_mutex_lock(&stream->parser->mutex); + stream->freeing = true; + pthread_cond_signal(&stream->event_cond); + while (stream->waiters != 0) + pthread_cond_wait(&stream->free_cond, &stream->parser->mutex); + pthread_mutex_unlock(&stream->parser->mutex); + if (stream->their_src) { if (stream->post_sink) @@ -722,6 +746,7 @@ static void free_stream(struct wg_parser_stream *stream)

pthread_cond_destroy(&stream->event_cond); pthread_cond_destroy(&stream->event_empty_cond); + pthread_cond_destroy(&stream->free_cond);

free(stream); }

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/1278

Marvin

2:48 p.m.

New subject: [PATCH v2 1/1] winegstreamer: Do not destroy condition variables when a thread is waiting on them.

Hi,

It looks like your patch introduced the new failures shown below. Please investigate and fix them before resubmitting your patch. If they are not new, fixing them anyway would help a lot. Otherwise please ask for the known failures list to be updated.

The tests also ran into some preexisting test failures. If you know how to fix them that would be helpful. See the TestBot job for the details:

The full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=125850

Your paranoid android.

=== debian11 (32 bit report) ===

wmvcore: wmvcore.c:1663: Test failed: Stream 0: Format 3: Got hr 0xc00d0041. wmvcore.c:1680: Test failed: Stream 0: Format 3: Media types didn't match.

Giovanni Mascellani (＠giomasce)

2:21 p.m.

Ouch, I think this triggered unrelated review requests because I initially pushed some unrelated development commits. Sorry!

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13409

Rémi Bernon (＠rbernon)

4:53 p.m.

I don't think we should add more condition variables to this code, there's already way too many.

FWIW I've got some local changes of the wg_parser design, described in length in https://gitlab.winehq.org/wine/wine/-/merge_requests/1048, and waiting on it to be accepted, that would solve the issue described here as well.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13431

Zebediah Figura (＠zfigura)

7:41 p.m.

It would probably make more sense just to move the actual cleanup to wg_parser_destroy(), although in order to do that we'll also need to call wg_parser_destroy() on the quartz parser earlier.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13471

Giovanni Mascellani (＠giomasce)

8 Nov 8 Nov

12:35 p.m.

...

It would probably make more sense just to move the actual cleanup to wg_parser_destroy(), although in order to do that we'll also need to call wg_parser_destroy() on the quartz parser earlier.

I don't think that would change the issue substantially. The problem is that currently there is no synchronization between when the the reader thread calling `wg_parser_stream_get_buffer()` is done using the stream resources (particularly the condition variables) and when these resources are destroyed. Moving destruction to `wg_parser_destroy()`, in itself, wouldn't change much, because the client could call `RequestSample()` and then immediately `Shutdown()` and `Release()`.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13605

Zebediah Figura (＠zfigura)

6:37 p.m.

On Tue Nov 8 12:34:59 2022 +0000, Giovanni Mascellani wrote:

...

...
It would probably make more sense just to move the actual cleanup to

wg_parser_destroy(), although in order to do that we'll also need to call wg_parser_destroy() on the quartz parser earlier. I don't think that would change the issue substantially. The problem is that currently there is no synchronization between when the the reader thread calling `wg_parser_stream_get_buffer()` is done using the stream resources (particularly the condition variables) and when these resources are destroyed. Moving destruction to `wg_parser_destroy()`, in itself, wouldn't change much, because the client could call `RequestSample()` and then immediately `Shutdown()` and `Release()`.

Right, I forgot how the other frontends than quartz worked.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13744

Zebediah Figura (＠zfigura)

6:45 p.m.

On Tue Nov 8 18:37:39 2022 +0000, Zebediah Figura wrote:

...

Right, I forgot how the other frontends than quartz worked.

Actually, that implies a deeper synchronization problem in mfplat. If we can't clean up the condition variable, then we can't free the filter either. Do we know that Shutdown() should unblock RequestSample()? It seems quite plausible that it shouldn't (after all, I don't think RequestSample() should ever deadlock—it just might take a few ms to complete), and in that case it's possible that we should be doing some sort of locking on the mfplat side.

Alternatively, mfplat could move the whole wg_parser destruction to Release, on the idea that any concurrent thread calling RequestSample() should have a reference anyway. (I know that in theory mfplat should clean up everything in Shutdown, but as I understand that's just to break circular refcounts within mfplat, so it wouldn't apply here.)

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13745

Giovanni Mascellani (＠giomasce)

9 Nov 9 Nov

10:59 a.m.

On Tue Nov 8 18:45:05 2022 +0000, Zebediah Figura wrote:

...

Actually, that implies a deeper synchronization problem in mfplat. If we can't clean up the condition variable, then we can't free the filter either. Do we know that Shutdown() should unblock RequestSample()? It seems quite plausible that it shouldn't (after all, I don't think RequestSample() should ever deadlock—it just might take a few ms to complete), and in that case it's possible that we should be doing some sort of locking on the mfplat side. Alternatively, mfplat could move the whole wg_parser destruction to Release, on the idea that any concurrent thread calling RequestSample() should have a reference anyway. (I know that in theory mfplat should clean up everything in Shutdown, but as I understand that's just to break circular refcounts within mfplat, so it wouldn't apply here.)

I don't know if `Shutdown()` should unblock `RequestSample()` or not, and it doesn't appear to be easy to check. I can theoretically think of a test for that (by creating a media source over some byte stream specifically crafted so that it start blocking at some point), but it seems a lot of work for a detail that is probably not particularly important to reproduce correctly.

But yeah, the idea is that one of the three things you mention should happen: 1. `Shutdown()` unblocks `RequestSample()`, which presumably generates an error event; 2. `Shutdown()` waits for `RequestSample()` to return, hoping that it does quickly; (though this is not really obvious: the reason why this bug became apparent to me was that in some cases GStreamer is not able to produce a sample for a stream unless all the other streams are actively pumping out samples, I guess because of buffering reasons); 3. `Shutdown()` only does the cleanup part needed to break refcount cycles, and defer the rest to the destructor; implicitly, this still means that destruction has to wait for `RequestSample()` to finish.

Solutions 2 and 3 have a problem when `RequestSample()` is never going to finish, which probably should never happen, but in practice it does (and that's precisely the reason why I became aware of this bug). Specifically, if a file contains a lot of audio samples and no video samples for a while, and the application requests a video sample without receiving the audio samples, it is possible that GStreamer never produces the video sample (for reasons that I presume related to not having enough look ahead in the file), so everything is deadlocked.

I guess this should be a bug in itself to solve, but it might require some more architectural work. Solution 1 doesn't depend on that being solved, which is why I ended up coding solution 1.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13825

Giovanni Mascellani (＠giomasce)

11:54 a.m.

...

Actually, that implies a deeper synchronization problem in mfplat. If we can't clean up the condition variable, then we can't free the filter either.

Mmh, I think you're right. For example, it might be that `Shutdown()` is called when after a `RequestSample()`, but before the `SOURCE_ASYNC_REQUEST_SAMPLE` is being serviced. Thus `Shutdown()` would happily clean up everything, and then `wait_on_sample()` will try to use an object that has been destructed. So the solution must also do something at the media source level, which incidentally looks a bit under-synchronized (for example, `struct media_source::state` is concurrently accessed in many places without any mutex). I'll try to fix the MR.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13831

Zebediah Figura (＠zfigura)

8:11 p.m.

...

Solutions 2 and 3 have a problem when `RequestSample()` is never going to finish, which probably should never happen, but in practice it does (and that's precisely the reason why I became aware of this bug). Specifically, if a file contains a lot of audio samples and no video samples for a while, and the application requests a video sample without receiving the audio samples, it is possible that GStreamer never produces the video sample (for reasons that I presume related to not having enough look ahead in the file), so everything is deadlocked.

This bothers me, actually, because in theory gstreamer's multiqueue should prevent this. It's designed to try to buffer streams evenly, but if one stream is getting starved it *should* allow to increase the buffering limits on the others to prevent a deadlock. I think we should investigate why this isn't working correctly.

I know we have the "unlimited buffering" mechanism in place, but in theory that should be solved by the same mechanism, so it probably shouldn't be there either.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13867

Giovanni Mascellani (＠giomasce)

10 Nov 10 Nov

9:49 a.m.

...

This bothers me, actually, because in theory gstreamer's multiqueue should prevent this. It's designed to try to buffer streams evenly, but if one stream is getting starved it *should* allow to increase the buffering limits on the others to prevent a deadlock. I think we should investigate why this isn't working correctly.

Maybe the answer is trivial: soon after `Shutdown()` is called, `gst_element_set_state(GST_STATE_NULL)` is called, and I guess it's totally reasonable that after that GStreamer stops sending samples. So either we go with solution 1 or we have to rearchitecture the shutdown logic so that when `Shutdown()` is called the WG parser first checks that at least one sample is available for each enabled stream and then stops GStreamer. And right now I fails to see which advantages that would bring (until we have a proof that native does solution 2 or 3).

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_13926

Zebediah Figura (＠zfigura)

11 Nov 11 Nov

12:32 a.m.

...

...
This bothers me, actually, because in theory gstreamer's multiqueue should prevent this. It's designed to try to buffer streams evenly, but if one stream is getting starved it *should* allow to increase the buffering limits on the others to prevent a deadlock. I think we should investigate why this isn't working correctly.

Maybe the answer is trivial: soon after `Shutdown()` is called, `gst_element_set_state(GST_STATE_NULL)` is called, and I guess it's totally reasonable that after that GStreamer stops sending samples.

I don't understand, this seems orthogonal to the problem you're describing, where refusing to read from one stream can starve the other?

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_14013

Giovanni Mascellani (＠giomasce)

9:59 a.m.

...

I don't understand, this seems orthogonal to the problem you're describing, where refusing to read from one stream can starve the other?

What I am saying is that there is probably no problem with one stream starving another one, as it should be. At some point I thought that was the problem, because I noticed that a stream wasn't producing samples any more, but that probably happened because `Shutdown()` had been called in the mean time.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_14040

Zebediah Figura (＠zfigura)

9 p.m.

...

What I am saying is that there is probably no problem with one stream starving another one, as it should be. At some point I thought that was the problem, because I noticed that a stream wasn't producing samples any more, but that probably happened because `Shutdown()` had been called in the mean time.

Ah, thanks for clarifying.

I think that if the correct solution is (2), then we probably want to deal with it on the mfplat side. Actually, in theory it should be as simple as putting a mutex around Shutdown() and RequestSample(); I'm not sure why there isn't one already...

I assume there *is* still a problem with starvation, given that we have that "unlimited buffering" thing going on anyway, but it's probably orthogonal. Fortunately we have a record of which application to retest :-)

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1278#note_14113

975

Age (days ago)

979

Last active (days ago)

wine-gitlab@winehq.org

15 comments

5 participants

tags (0)

participants (5)

Giovanni Mascellani
Giovanni Mascellani (＠giomasce)
Marvin
Rémi Bernon (＠rbernon)
Zebediah Figura (＠zfigura)