But IIRC the resolution delay is from prerolling, which we can do lazily if necessary.
Imho adding some lazyness only makes things more complicated, and I don't know if it's truly about prerolling. As we also currently decode in the pipeline, we only get notified of decoded stream caps *after* auto-plugging is done. This is what is taking too long, making this faster *also* actually requires not to decode in the pipeline.
Okay, I'm sorry, I misremembered what we were doing. We don't actually need to preroll—we need caps and duration, and yeah, it's the autoplugging that takes too long.
But that's something we can avoid. With the work that Ziqing did we already have some of the infrastructure in place for skipping the decoding part of the pipeline. It wouldn't be too hard to resolve some of the initialization earlier, even if we do append decoding chains.
It's not a trivial stack of code, and we need most of it.
I don't think we do, at least not for media foundation but also probably in general. Using decodebin creates more problems than it solves. We lose control over everything that it internally does, its threading, most of its plugging decisions, and we then have to make up for it on our side through complex synchronization.
Not using it gives us back the control we need for compatibility. The Win32 components we have to implement, on the MF side particularly because applications are apparently using them more directly, but also on the DirectShow and WMVReader side, as has been seen a couple of times, are lower level and have deterministic behavior most of the time.
If the only thing we need is deterministic stream order, *and* that doesn't block the vast majority of applications, can we just get that into GStreamer instead?
It's non-deterministic in nature, its stream ordering but its sample ordering as well and everything about it. I am not even going to try convincing GStreamer to do otherwise, this is at best a feature, and at worst a necessary side effect of it using queues.
I don't understand this assertion. I've reread the decodebin source since reading this comment, and I don't see anything in the autoplugging logic that we clearly don't need. (To be sure, there are some decodebin features we don't need, such as use-buffering, EOS handling, and possibly also the subtitle-encoding logic. But the bulk of decodebin is the autoplugging part, and as far as I can tell we need all of it.)
decodebin also has no control over sample ordering in the first place, that's left up to the whims of individual demuxers. multiqueue may change that sample ordering, but we kind of need multiqueue—particularly because of the starvation logic, but the idea of getting rid of buffering on the GStreamer side definitely makes me nervous. What problems exactly does it cause? Can you please spell them out? I'd like to understand so that I can think about and evaluate all of the possible solutions.
I am not interested in trying to implement this in GStreamer, I have it mostly done in Wine and it makes everything much simpler as well as also being quite simple in term of implementation. It gets rid of GStreamer threads, solving many problems at the same time and making debugging easier. It also removes the need for condition variables entirely, solving many crashes on thread exit as well.
Of course there's plenty of other ways to do it differently and elsewhere, but I don't see any good reason to look for something else, and I have spent enough time on this to not be interested in spending more of it on other approaches. I also know Wine best and I am not interested in spending weeks to figure how to do it in GStreamer.
I'm not going to demand you do the work, but I hope you can understand that this isn't an argument for doing something a certain way upstream.
I like using GStreamer as it offers us a nice abstraction over multimedia libraries and offers a lot of useful high and low level tools. But if using GStreamer means we have to be dogmatic about it and use only the higher level components, because they are supposedly implementing some important logic for us, I'm going to start thinking more seriously about supporting Paul's idea of using codec libraries more directly instead.
I'm sorry to be difficult, I'm really not trying to be. But as a maintainer of this component, it's my job to understand all of the code that's going in, and to have thought about the options holistically. Largely that's what I'm trying to do here—is ask questions, understand what it actually is that Win32 applications need, and evaluate whether we can do something another way. I'm not even trying to say "this definitely should be done another way", but I do need enough of a reason to discard other possibilities. I currently don't see that.
I recognize it can be frustrating to do a lot of work, and then be asked to do it a different way. Having been on the other side of that often, I can only offer that it's best to go into development with the idea already in mind that things may need to be redone, and also, to try to discuss high-level design with maintainers before spending the effort on implementation.
And I would like to offer that it's currently very frustrating in turn for me to see a patch, ask some questions about why it's being done a certain way—which seems surprising or potentially problematic to me—ask if it can be done a different way, and be met with nothing but obstinate and argumentative refusal, and high-level statements like "decodebin is a bad fit for wine" that don't explain the specific problems that you're trying to solve. It really does not make me want to engage in patch review or discussion, which I regret to say has contributed to my being rather slow to respond in general. I'm not asking for unquestioning accession to my propositions—even when I *do* fully understand the problem, I often forget things or have ideas that don't pan out the way I think they will, as I'm sure you're well aware. But I would appreciate a bit more effort to see things from my perspective, and understand why I'm making the propositions that I'm making.