On 9/8/21 8:36 PM, Derek Lesho wrote:
On 9/8/21 6:21 PM, Zebediah Figura (she/her) wrote:
On 9/8/21 2:25 PM, Derek Lesho wrote:
On 9/6/21 4:14 PM, Zebediah Figura wrote:
+static void src_need_data(GstElement *appsrc, guint length, gpointer user) {
+ struct wg_parser *parser = user; pthread_mutex_lock(&parser->mutex);
+ /* Sometimes GstAppSrc sends identical need-data requests when it is woken up, + we can rely on push-buffer not having completed (and hence offset being -1) + because they are blocked on the internal mutex held by the pulling thread + calling this callback. */ + if (parser->read_request.offset != -1) {
Well, sure, but do we really need this if block at all?
Yes, we do. Without it, if there a spurious wakeup of appsrc's wait loop in gst_app_src_create (get_range) [1], and the "push-buffer" signal is signaled just after the loop's g_cond_wait reacquires the mutex on return [1], we could have "push-buffer" blocked on acquisition of the mutex [2] held by gst_app_src_create. The function would block before pushing the buffer to the internal queue. Then, gst_app_src_create continues and, seeing as there are no buffers in the queue, unlocks the mutex and calls need_data. Here we have our race:
- if the need_data's code is called first, read_request's offset and
size of overwritten with the same values. Then, the code waiting on the mutex pushes the buffer responding to our request, and everything continues as normal (as if another need_data never occurred).
(Prepare for the text wall 😁)
- If the push-buffer code waiting on the mutex continues first, the
internal buffer queue has the buffer written to it, and the push-buffer signal returns. Afterwards, read_request.offset is set to the sentinel value indicating that the request has been responded to, and that a new need-data is awaited, push_data returns. Then, need_data is run, acquiring the mutex too late to catch the previous buffer send, and it requests a read of the same size directly following the prior read request, since the offset had been updated by push_data. gst_app_src_create then sees the pushed buffer and returns it back to the getrange-related function, only for another getrange to come in with an offset not directly following the last request's. While the client code is reading the invalid request's data, src_seek_data is called. Right now we have an assert to make sure that there is no active request when seek_data is called, but if we didn't, we'd just set next_pull_offset to the seek's value. Then, need_data would be called again, but all it would do is update the size to the new request's size. After this, push_data (in response to the previous request) would be called and, for simplicity's sake, if the size of the two requests were identical, a push-buffer would be called, src_seek_data would pick up and have no way of knowing that the data is coming from after previous request's location in the file, which could cause any number of errors. If we keep around both next_pull_offset and read_request.offset, we could compare them in push_data and discard the buffer if they don't match, but this seems a lot more complicated than catching this case earlier on and preventing all this confusion.
1: https://github.com/GStreamer/gst-plugins-base/blob/master/gst-libs/gst/app/g...
2: https://github.com/GStreamer/gst-plugins-base/blob/master/gst-libs/gst/app/g...
Okay, so I was operating under the assumption that appsrc would drop buffers if the offset didn't match what it wanted, since that seemed like the only safe way to implement the behaviour the documentation describes. Apparently it doesn't. That means that I guess we can't send more than one buffer at a time, i.e. we really need to give it only what it asks for.
Right, to be honest I completely forgot that we were sending in the offset through the buffer's field, but it seems that neither app source nor GstPad validate it (maybe something downstream would though?). Rethinking this after some time away from the keyboard, I now realize that my conditional only triggers in the first unproblematic case, and that only the assert in seek_data would catch the problem of the 2nd case. I also now realize that, moving to using only next_pull_offset and a boolean flag, push_data would set on the buffer the updated/seeked offset, not the offset of the data that it actually retrieved. But even if we keep both of them to verify the offset in push_data ourselves, we would still have the possibility of going out of sync in continuous pulls unless GStreamer itself verified the offset, although I haven't though about the consequences of that.
I'll have to submit a patch to GStreamer to clarify the documentation.
In effect I guess that means we should keep the need-data/enough-data introduced by this patch
I'm not sure what you're referring to here, I didn't add enough-data handling to this patch.
only for stream mode, i.e. push mode. For random access mode we should treat need-data as license to send only a single buffer. I.e. upon receiving wg_parser_push_data() I guess we should reset whatever flag or sentinel to nonsignaled, so that a subsequent wg_parser_get_next_read_offset() blocks.
That's already being done in this patch, and doesn't actually fix the issue. I'm not actually sure what the correct way to fix this would be, tomorrow I'll try consulting with the GStreamer devs to make sure I'm not making up a problem and see what they suggest.
Sorry, I think I initially failed to read your description carefully enough, and assumed it was something different.
I see the problem now, but in fact this conditional (the one in your patch) isn't enough to fix it. We can get a race this bad:
thread A -------------------------------------------------- queue is empty emit need_data queue is empty wg_parser_get_next_read_offset wg_parser_push_data emit need_data wg_parser_get_next_read_offset ... wg_parser_push_data
and push the wrong buffer. It won't be the same buffer (we won't get a seek, so we're supposed to push the next one consecutively), but it will be the wrong one.
Unfortunately I think this means that appsrc is unusably broken. It can and should be fixed upstream, one way or another, but we can't use it in our code, at least not for a while.
I could also be missing something, though.