I think you're taking "decode" in the wrong sense. The way I'm using the term, and I think the use in the name "decodebin", is not meant to be opposed with "demux", but rather with "encode". (I also don't think contrasting decodebin with parsebin in that way makes much sense, since decodebin does everything that parsebin does.)
I'm taking it in the GStreamer sense. "parsebin unpacks the contents of the input stream to the level of parsed elementary streams, **but unlike decodebin it doesn't connect decoder elements**."
If decodebin was meant to be used like that there would be no need for them to offer a separate parsebin.
Certainly "decode" has a specific, limited meaning in the world of multimedia, one that is mutually exclusive with "demux". However, "decoding" in that limited meaning is not what decodebin does; it demuxes *and* decodes, and it is specifically designed to do both of those things. "decode" is also often used in the world of multimedia to refer to that process. And despite its ambiguity, I can't think of a better word for it.
That said, though, I'd like to again reiterate that I think getting too hung up on the naming is a mistake. Even if there were a better name for decodebin, GStreamer is stuck with it now. If you're concerned about whether decodebin is a good fit for our purposes, I would advise rather to consider how the API is designed. decodebin is specifically built to be able to stop decoding anywhere, not just at raw formats, which is why I don't see any conceptual mismatch with our goals.
parsebin provides a very simple way to stop right after the demuxer. If that was our goal with the media source [and parsebin was introduced earlier than a couple of versions ago], then I would certainly advocate for using parsebin over decodebin, on the grounds that it's a better fit.
However, I don't think that is our goal. As I describe below, I think that we need to put a strong weight on preserving decoding support for *all* containers and codecs, not just the ones Windows supports, which means that the media source cannot always output a compressed format. But also, I think it would be a mistake to output *every* uncompressed format all at once. I think we should rather progressively build up a list of formats we know can be decoded, one at a time, both to avoid regressions and to make them more bisectable. decodebin is *specifically* built to do this; this is what the autoplug-continue signal does, and why it provides the stream caps.
By the way, I asked earlier in the thread, but I don't think I got an answer: what part(s) of the current code cause wg_parser initialization to be too slow? What are the bottlenecks, so to speak?
I already detailed this previously, but basically everything about it. Its more complicated initialization, pull mode which triggers a very different typefind pattern with a lot of unnecessary reads, auto-plugging which needs to lookup decoders, prerolling.
I take it that you haven't done any measurements of the specific parts, then? Or have you, and I simply missed it—could you please point me to where that was done in that case?
I ask because, while I can see how some of those parts of the initialization process may be slow, I can also imagine that some parts may not be. And while it's not clear to me why any of these can't be solved within the design of decodebin and wg_parser, it also seems that some parts are easier to solve than others, so it's useful to have that context when trying to evaluate the assertion that we need to throw away decodebin and wg_parser.
The statement about pull mode triggering unnecessary reads is one I don't recall being made before, and I find it very surprising. I would expect that pull mode categorically requires *less* data to be read, not more. The only reason I can imagine for pull mode being slower is that it'd emit more small read calls (and these would be slow due to I/O overhead and/or crossing the PE/Unix boundary), but we have a caching mechanism that's specifically meant to solve that problem (e6e7c7916d53). Can you please describe, in detail, what problem you're encountering with pull mode?
I also don't know what "more complicated initialization" is supposed to mean; could you please clarify?
I'd also like to know how you intend to deal with seeking in a way that avoids this problem, whatever it is; if the problem inheres to having an extra thread (as you seem to suggest) then I believe that the need for seeking will mean that that thread can't just be removed even if we *were* to stop supporting pull mode.
Nothing in seeking support mandates an extra thread. The demuxer sends seek events to the source pad when pushing buffers, and expects the next buffer to be read from the desired offset.
I'm sorry, I realize now I was assuming a certain design. Would you mind providing a brief overview of how you are pushing data?
And unfortunately, we can't just get rid of pull mode. While many demuxers and parsers for common formats shipped with GStreamer support push mode, as of 1.20 some do not (sfdec, musepackdec); some support push mode but cannot seek (rmdemux, midiparse, modplug, and any libav-based demuxer), and some support push mode but at the expense of some features (mxfdemux at least, and possibly others).
I suggest we start by implementing the natively supported formats, which applications are using, the right way before worrying about more exotic formats.
In a vacuum I would support this suggestion, but as things stand there are three problems.
The first problem is that we *already* support any format that the host GStreamer supports. Removing that support would be a regression. That doesn't mean that it's absolutely off the table, but it does mean that it needs a very strong argument.
The second is that, well, if we can support all formats easily then I think we should. As far as I can tell that's the case—pull mode may make things more complicated, but it doesn't make them intractable. Now, as I've been saying, I could be wrong about this; you seem to have found some intractable problem with pull mode, but I need to understand what that problem is.
The third problem is that getting rid of pull mode doesn't just affect those demuxers that don't support push mode; it affects demuxers that *do* as well. Pull mode is inherently more efficient, in terms of CPU time and memory usage. Some demuxers take a heavy penalty to one or both if they can't operate in pull mode.
If these demuxers don't support push mode I'd say it's a bug on their side, and I would agree that *this* is worth trying to fix upstream. Most of these are in the Bad or Ugly plugins, which probably means they aren't very good quality.
But... it's not. There's nothing in the design or specification of GStreamer that requires every element to support push mode. There's *certainly* nothing that requires every element to support push mode as efficiently and featurefully as pull mode. And in practice any parser is going to expect to be plugged into a file source, or something that acts like a file source, and that means it's wholly reasonable for them to expect pull mode to work.
The fundamental fact is that not all formats, and especially containers, are designed for the push mode model. Some require data at the end of the file to be read before doing anything with data at the beginning. [3] Some store each stream as a single contiguous string of bytes rather than interleaving them, which means that seeking to a time can't exactly be meaningfully translated into seeking to a byte offset. These formats may be able to support push mode with enough hammering, and that's been done for some builtin GStreamer elements, but that doesn't mean it's going to work well.
[3] This has obvious disadvantages—not just in terms of playback, but also in terms of file stability. I've had to deal with MPEG-4 files that got accidentally truncated and were as a result unusable. But there are also good reasons for this: if you're recording a long stream, you want to be able to stream audio/video chunks directly to disk, and then build things like indexes once you're done streaming. That means you have to put those indexes at the end.
To be clear, I am not, and at no point have been, trying to say "I have heard and understood your concerns and rejected them, I think we should use decodebin and wg_parser". I am not saying this specifically because I have *not* heard and understood your concerns. I have only heard generic statements like "this doesn't work" or "this causes problems". I really need to understand what the specific problems are.
To that end, I'd like to repeat some questions I posed, that I don't think were answered:
* What problems result from having a read thread, or from pull mode? (I am not sure if you are referring to the same problems, but I assume so.)
* What problems are there with condition variables? (From your earlier replies I understand that condition variables are not the only problem with pull mode, but perhaps I misread?)
* Pull mode aside (see the question I posed above), are any of the slow parts of parser initialization impossible to solve with decodebin? I don't see any way in which they are; in fact, I don't see any way in which they'd even require changes to the wg_parser API. But I must be missing something.
* What conceptual problems are there with wg_parser? If it's just the read thread (but I feel like you've asserted there are other problems), can we replace that part of wg_parser instead of rewriting it from scratch? If it's a matter of the conceptual design behind wg_parser, have I adequately addressed your concern by explaining the idea behind it, or do you still have strong objections?