I take it that you haven't done any measurements of the specific parts, then? Or have you, and I simply missed it—could you please point me to where that was done in that case?
I did, although it was a long time ago. Doing it again, when native takes 0-3ms to resolve a media source entirely, `wg_parser_connect` takes ~80ms on average, on the same beefy machine though Windows is running inside a VM, with high variations.
- `gst_element_set_state` / `gst_element_get_state` alone take 70ms on average, which high variation.
- Waiting for no-more-pads adds 5ms or more to that.
The wg_source implemented in the branch mentioned above does the same thing (resolving a mp4 video streams with all the streams enumerated) in ~1ms.
Ah! Thanks, that's interesting. So it looks like autoplugging is the big problem. I'm a bit confused about "no-more-pads", since at least here, decodebin will actually emit no-more-pads before resolving the state change, but maybe that's different in different versions? Or did you mean prerolling?
The statement about pull mode triggering unnecessary reads is one I don't recall being made before, and I find it very surprising. I would expect that pull mode categorically requires *less* data to be read, not more. The only reason I can imagine for pull mode being slower is that it'd emit more small read calls (and these would be slow due to I/O overhead and/or crossing the PE/Unix boundary), but we have a caching mechanism that's specifically meant to solve that problem (e6e7c7916d53). Can you please describe, in detail, what problem you're encountering with pull mode?
Pull mode typefind triggers plenty of 4096 read requests, only shifted by a small number of bytes. This is not specific to decodebin and I encountered the same behavior when trying to use typefind plugin in the wg_source, in pull mode. The cache helps a bit, but it still takes ~5ms to satisfy all the requests, especially as it also reads elsewhere in the stream.
I don't know the exact reason and I don't think I should spend much more time on this, as using `gst_type_find_helper_for_data_with_extension` instead is enough to figure the mime type for all the formats I tested with, and is happy with the first chunk of data.
Ah! That's a nice find.
I was curious enough myself to look more into this. Most typefinding elements, although not all, are indeed happy with the first page. Providing the extension also speeds up the process dramatically, because typefind will then try the callback associated with that extension first, and that will pretty much always succeed immediately.
The upshot is that we *can* in fact take advantage of this within the GStreamer pipeline, by responding to the "uri" query.
What problems result from having a read thread, or from pull mode? (I am not sure if you are referring to the same problems, but I assume so.)
I think there's now about a dozen of people who've been fighting or are still fighting with deadlocks in the media source. I'm not saying that race conditions are easy to avoid, but it seems to me that a thread isn't really necessary here, and that it would avoid that class of problems.
Yeah, threading is hard. (Although I suspect that this is broadly the fault of horribly underspecified and overcomplicated APIs like DirectShow and Media Foundation. wg_parser is supposed to have a simple threading API; this was an explicit design goal I had for it.)
Make no mistake, if we could get rid of the read thread, I'd be happy, that would allow us to remove a fair amount of code. But I don't see a way to avoid it as long as we're supporting pull mode, and I don't think cutting support for pull mode is a reasonable option at this point.
I'm sorry, I realize now I was assuming a certain design. Would you mind providing a brief overview of how you are pushing data?
It's in the branch mentioned above, basically call `wg_source_push_data` / `wg_source_get_position` in a loop. Pushing data (or later, seeking) may cause the demuxer to send a seek event to our source pad, which we'll receive synchronously before returning, and the next `wg_source_get_position` will tell where we need to read the next chunk of data from. https://gitlab.winehq.org/rbernon/wine/-/commit/fe1ed0e07cf8b2b88380813d3818...
How do you push data after initialization, i.e. while streaming?
What problems are there with condition variables? (From your earlier replies I understand that condition variables are not the only problem with pull mode, but perhaps I misread?)
This has been replied and further discussed on https://gitlab.winehq.org/wine/wine/-/merge_requests/3737#note_44725. I understand that it's a radical solution but I think we should avoid using pthread condition variables entirely, until we have a fix. I've made several proposal to that end, in https://gitlab.winehq.org/wine/wine/-/merge_requests/1088, but none have been approved (and I find resorting to SIGKILL much uglier).
Yeah, I see where you're coming from, but I don't think that "don't use condition variables" is the right answer. [And really, even if it was, condition variables don't inhere to wg_parser. We could do synchronization with NT APIs instead if we really wanted to.]
The way I read Alexandre's answer to 1088, it hasn't been accepted because avoiding the crashes just isn't worth the extra complication. I'm inclined to agree with that assessment, because the crashes don't do any more harm than printing errors to the console. Along similar lines, I think that avoiding the crashes isn't worth the extra complication that results from not using libc functions.
SIGKILL is ugly, sure, but it's ugly because we're killing processes. It's fundamentally no uglier than TerminateProcess().
Pull mode aside (see the question I posed above), are any of the slow parts of parser initialization impossible to solve with decodebin? I don't see any way in which they are; in fact, I don't see any way in which they'd even require changes to the wg_parser API. But I must be missing something.
Auto-plugging is time consuming, not needed and not desired: we don't want decoders to fully initialize and start decoding buffers during media source resolution.
Ah, sure, but we also don't *need* to autoplug immediately. I don't know if I described this idea before, but what we can do is defer stream autoplugging to later. This should be quite simple, just a matter of moving some code from pad_added_cb() to a function lazily called by wg_parser_stream_enable() or wg_parser_stream_get_preferred_format() [I think those are the only places that matter].
More importantly, though, if we're going to be outputting compressed formats for those videos that need fast media source initialization, then we don't even need to worry about autoplugging overhead, because we won't need to autoplug. Currently we do anyway, in order to support toggling at runtime between encoded and unencoded video, but I have my doubts about the utility of that feature.
MF has its own pipeline resolution done elsewhere, which we should use instead.
Right, but that's a separate issue. If we're outputting compressed formats then we won't be autoplugging anyway.
What conceptual problems are there with wg_parser? If it's just the read thread (but I feel like you've asserted there are other problems), can we replace that part of wg_parser instead of rewriting it from scratch? If it's a matter of the conceptual design behind wg_parser, have I adequately addressed your concern by explaining the idea behind it, or do you still have strong objections?
Like I previous said, it would be much simpler to work on one API at a time. I don't see why all the decoding API *have* to use the same internal interface forever. Having MF start using a different approach will let us improve it, figure the problems and requirements inherent to having compressed buffers passing through, progressively, and without any risk of breaking DirectShow and WMReader at the same time.
Sure, that's understandable. But the right thing to do isn't always the least risky thing to do. I think in this case the right thing to do is keep the wg_parser interface, making incremental changes to it and its backends if necessary.
I'm not saying that the wg_parser interface is perfect and doesn't need changes, far from it. There are things I don't like about it currently, though I haven't yet come up with a satisfactory way to improve them. But those changes can be done without rewriting the whole thing.
Last, I don't think that adding two decodebin one after another, and later hooking them with probes to extract buffers from the pipeline is the right way of doing this. Yes, it is possible and GStreamer has APIs to do this kind of things, but there's a much simpler way.
I don't much like the probes either, although when consulting myself I determined I don't have a better objection than "well, it's not the way a normal pipeline is supposed to work". But it works, it's the intended purpose of probes, and it doesn't cause any problems.
That said, the probes are only there to support toggling between compressed and uncompressed streams at runtime, which has always felt like a feature of questionable utility, and I wouldn't object if it was just removed.