I take it that you haven't done any measurements of the specific parts, then? Or have you, and I simply missed it—could you please point me to where that was done in that case?
I did, although it was a long time ago. Doing it again, when native takes 0-3ms to resolve a media source entirely, `wg_parser_connect` takes ~80ms on average, on the same beefy machine though Windows is running inside a VM, with high variations.
* `gst_element_set_state` / `gst_element_get_state` alone take 70ms on average, which high variation.
* Waiting for no-more-pads adds 5ms or more to that.
The wg_source implemented in the branch mentioned above does the same thing (resolving a mp4 video streams with all the streams enumerated) in ~1ms.
The statement about pull mode triggering unnecessary reads is one I don't recall being made before, and I find it very surprising. I would expect that pull mode categorically requires *less* data to be read, not more. The only reason I can imagine for pull mode being slower is that it'd emit more small read calls (and these would be slow due to I/O overhead and/or crossing the PE/Unix boundary), but we have a caching mechanism that's specifically meant to solve that problem (e6e7c7916d53). Can you please describe, in detail, what problem you're encountering with pull mode?
Pull mode typefind triggers plenty of 4096 read requests, only shifted by a small number of bytes. This is not specific to decodebin and I encountered the same behavior when trying to use typefind plugin in the wg_source, in pull mode. The cache helps a bit, but it still takes ~5ms to satisfy all the requests, especially as it also reads elsewhere in the stream.
I don't know the exact reason and I don't think I should spend much more time on this, as using `gst_type_find_helper_for_data_with_extension` instead is enough to figure the mime type for all the formats I tested with, and is happy with the first chunk of data.
I ask because, while I can see how some of those parts of the initialization process may be slow, I can also imagine that some parts may not be. And while it's not clear to me why any of these can't be solved within the design of decodebin and wg_parser, it also seems that some parts are easier to solve than others, so it's useful to have that context when trying to evaluate the assertion that we need to throw away decodebin and wg_parser.
After typefinding there's then the autoplugging, which creates decoder elements (be it the first decodebin, or the second decodebin we may add), and request more data for their initialization. Reading data takes some times, as well as having the decoders initialize with the first buffers.
What problems result from having a read thread, or from pull mode? (I am not sure if you are referring to the same problems, but I assume so.)
I think there's now about a dozen of people who've been fighting or are still fighting with deadlocks in the media source. I'm not saying that race conditions are easy to avoid, but it seems to me that a thread isn't really necessary here, and that it would avoid that class of problems.
I'm sorry, I realize now I was assuming a certain design. Would you mind providing a brief overview of how you are pushing data?
It's in the branch mentioned above, basically call `wg_source_push_data` / `wg_source_get_position` in a loop. Pushing data (or later, seeking) may cause the demuxer to send a seek event to our source pad, which we'll receive synchronously before returning, and the next `wg_source_get_position` will tell where we need to read the next chunk of data from. https://gitlab.winehq.org/rbernon/wine/-/commit/fe1ed0e07cf8b2b88380813d3818...
What problems are there with condition variables? (From your earlier replies I understand that condition variables are not the only problem with pull mode, but perhaps I misread?)
This has been replied and further discussed on https://gitlab.winehq.org/wine/wine/-/merge_requests/3737#note_44725. I understand that it's a radical solution but I think we should avoid using pthread condition variables entirely, until we have a fix. I've made several proposal to that end, in https://gitlab.winehq.org/wine/wine/-/merge_requests/1088, but none have been approved (and I find resorting to SIGKILL much uglier).
Pull mode aside (see the question I posed above), are any of the slow parts of parser initialization impossible to solve with decodebin? I don't see any way in which they are; in fact, I don't see any way in which they'd even require changes to the wg_parser API. But I must be missing something.
Auto-plugging is time consuming, not needed and not desired: we don't want decoders to fully initialize and start decoding buffers during media source resolution. MF has its own pipeline resolution done elsewhere, which we should use instead.
What conceptual problems are there with wg_parser? If it's just the read thread (but I feel like you've asserted there are other problems), can we replace that part of wg_parser instead of rewriting it from scratch? If it's a matter of the conceptual design behind wg_parser, have I adequately addressed your concern by explaining the idea behind it, or do you still have strong objections?
Like I previous said, it would be much simpler to work on one API at a time. I don't see why all the decoding API *have* to use the same internal interface forever. Having MF start using a different approach will let us improve it, figure the problems and requirements inherent to having compressed buffers passing through, progressively, and without any risk of breaking DirectShow and WMReader at the same time.
It will also avoid adding more complexity to the wg_parser, which I would have otherwise to, if I were to introduce a separate mean of pushing data than the read thread. I find it already enough complicated, with features that are unused but which we have to work with (push mode thread for instance), and I would prefer to work with something simpler from the beginning, doing only the demuxing task with no added feature.
Last, I don't think that adding two decodebin one after another, and later hooking them with probes to extract buffers from the pipeline is the right way of doing this. Yes, it is possible and GStreamer has APIs to do this kind of things, but there's a much simpler way.