But that's something we can avoid. With the work that Ziqing did we already have some of the infrastructure in place for skipping the decoding part of the pipeline. It wouldn't be too hard to resolve some of the initialization earlier, even if we do append decoding chains.
Adding a stream decodebin one after the other parser decodebin is in my opinion an ugly hack, and it adds complexity that will be painful to work with when debugging pipeline issues. It is not the right way to fix the issue, and the right way to fix it is to implement the mechanisms native has.
But the bulk of decodebin is the autoplugging part, and as far as I can tell we need all of it.
We don't. MF has this implemented already, and in a way that is tested against native behavior. What we lack now is the addition of decoder elements when pipelines are resolved.
decodebin also has no control over sample ordering in the first place, that's left up to the whims of individual demuxers. multiqueue may change that sample ordering, but we kind of need multiqueue—particularly because of the starvation logic, but the idea of getting rid of buffering on the GStreamer side definitely makes me nervous.
Yes, I can very well understand that. But this is how native works and we will at some point have to. Delaying this only keeps the related problems unsolved.
Working on a prototype I can already see that there's several broken parts when decoders are involved, but it's not a reason not to have them, it's the consequence of not having them earlier.
The more we wait and the more we add complexity to the hacks we have, and the harder it will get to untangle everything later when we will want / have to implement them.
Largely that's what I'm trying to do here—is ask questions, understand what it actually is that Win32 applications need, and evaluate whether we can do something another way. I'm not even trying to say "this definitely should be done another way", but I do need enough of a reason to discard other possibilities. I currently don't see that.
You have stated multiple times that you did not know anything about MF, and were not interested or confident to review any of the code there. I don't think this is compatible with maintaining the code and making decisions over its direction.
And I would like to offer that it's currently very frustrating in turn for me to see a patch, ask some questions about why it's being done a certain way—which seems surprising or potentially problematic to me—ask if it can be done a different way, and be met with nothing but obstinate and argumentative refusal, and high-level statements like "decodebin is a bad fit for wine" that don't explain the specific problems that you're trying to solve.
I have been explaining the problem in many occasions. Applications use MF in the same way we use GStreamer, they build pipelines, they insert, request or expect individual components presence.
To give some very actual examples:
MF applications are expected to get D3D buffers out by creating a IMFDXGIDeviceManager instance and configure the H264 decoder with it. If there is no H264 decoder, there is nothing to configure and they fail.
Trailmakers for instance calls IMFSourceReader::GetServiceForStream, and expects to receive a decoder IMFTransform. Yes, we can return a fake one, at the cost of unnecessary buffer copies, and it's what Proton currently does, but this is a hack.
This is true for MF, but this is also true for DirectShow:
* Space Engineers creates a DirectShow graph, instantiates a WMAsfReader and a WMV decoder DMO filter, and connect their pins itself. It then expects to receive buffers out of the decoder DMO.
Yes, we can maybe implement this by having fake pass-through decoders, but once again, this is not the right way because this is not the way native works.
Several games have expectations over video frame plane alignment, some in MF like Greedfall or Wargroove, some using WMV/ASF reader like Resident Evil 0:
This is some inherent property of the defaults of individual decoders, but also of the media types involved in the negotiation.
With the decodebin approach, this is complicated. We need to add the buffer pool and alignment metadata to wg_parser, but this is also going to be dependent on the client API that uses it because they don't behave all the same, and we would need to add heuristic to decide of the default alignments depending on the compressed formats.
Decoupling decoders and demuxers we can, first, test and implement this in the individual decoder components, like it is done with H264 decoder. Then, everything works out of the box after decoders are plugged in and this is the only place it needs to be done.