But IIRC the resolution delay is from prerolling, which we can do lazily if necessary.
Imho adding some lazyness only makes things more complicated, and I don't know if it's truly about prerolling. As we also currently decode in the pipeline, we only get notified of decoded stream caps *after* auto-plugging is done. This is what is taking too long, making this faster *also* actually requires not to decode in the pipeline.
It's not a trivial stack of code, and we need most of it.
I don't think we do, at least not for media foundation but also probably in general. Using decodebin creates more problems than it solves. We lose control over everything that it internally does, its threading, most of its plugging decisions, and we then have to make up for it on our side through complex synchronization.
Not using it gives us back the control we need for compatibility. The Win32 components we have to implement, on the MF side particularly because applications are apparently using them more directly, but also on the DirectShow and WMVReader side, as has been seen a couple of times, are lower level and have deterministic behavior most of the time.
If the only thing we need is deterministic stream order, *and* that doesn't block the vast majority of applications, can we just get that into GStreamer instead?
It's non-deterministic in nature, its stream ordering but its sample ordering as well and everything about it. I am not even going to try convincing GStreamer to do otherwise, this is at best a feature, and at worst a necessary side effect of it using queues.
I am not interested in trying to implement this in GStreamer, I have it mostly done in Wine and it makes everything much simpler as well as also being quite simple in term of implementation. It gets rid of GStreamer threads, solving many problems at the same time and making debugging easier. It also removes the need for condition variables entirely, solving many crashes on thread exit as well.
Of course there's plenty of other ways to do it differently and elsewhere, but I don't see any good reason to look for something else, and I have spent enough time on this to not be interested in spending more of it on other approaches. I also know Wine best and I am not interested in spending weeks to figure how to do it in GStreamer.
I like using GStreamer as it offers us a nice abstraction over multimedia libraries and offers a lot of useful high and low level tools. But if using GStreamer means we have to be dogmatic about it and use only the higher level components, because they are supposedly implementing some important logic for us, I'm going to start thinking more seriously about supporting Paul's idea of using codec libraries more directly instead.