On 3/26/20 6:07 PM, Derek Lesho wrote:
On 3/26/20 4:56 PM, Zebediah Figura wrote:
There's another broad question I have with this approach, actually, which is fundamental enough I have to assume it's at had some thought put into it, but it would be nice if that discussion happened in a more public place, and was justified in the patches sent.
Essentially, the question is: what if we were to use decodebin directly?
As I understand (and admittedly Media Foundation is far more complex than I could hope to understand) an application which just calls IMFSourceResolver methods just needs to get back a working IMFMediaSource, and we could wrap decodebin with one of those, similar to the quartz wrapper.
First of all, this is something I think we want to do anyway. Microsoft has no demuxer for, say, Vorbis (at least, there's not one registered on my Windows 10 machine), but I think that we want to be able to play back Vorbis files anyway (in, say, a Win32 media player application). Instead of writing yet another source for vorbis, and for each other obscure format, we just write one generic decodebin wrapper.
Second of all, the most obvious benefit, at least while looking at these patches, is that you now don't need to write caps <-> IMFMediaType conversion for every type on the planet. Another benefit is that you let all of the decoding happen within a single GStreamer pipeline, which is probably better for performance. You also can simplify your postprocessing step to adding a single videoconvert and audioconvert, instead of having to manually (or semi-manually) add e.g. an h264 parser element. These are some of the benefits I had in mind when removing the GStreamer quartz transforms.
Even in the case where the application manually creates e.g. an MPEG-4 source, my understanding is it's still the source's job to automatically append transforms to match the requested type. We'd just be moving that from the mfplat level to the gstreamer level—i.e. let decodebin select the 'transforms' needed to convert to raw video and audio.
It obviously wouldn't match native structure, but it's not clear to me that it would fail to match native in a way that would cause problems. Judging from my experience with quartz, most applications aren't going to care how their media is decoded as long as they get raw samples out of it. Only a select few build the graph manually because they don't realize that they can autoplug, or make assumptions about which filters will be present once autoplugging is done, and some of those even fall back to autoplugging if their preferred method fails. Maybe the situation is different with mfplat, but given that there is a way to let mfplat figure out which sources and transforms to use, I'm gonna be really surprised if most applications aren't using it.
If you do come across an application that requires we mimic native's specific arrangement of sources and transforms, it seems to me it wouldn't require that much effort to swap a different parser in for decodebin, and to implement the necessary bits in the media type conversion functions. Ultimately I suspect it'd be less work to have a decodebin wrapper + specific sources for applications that require them, than to manually implement every source and transform.
I'll make a more complete response to this tomorrow, but I really think that doing the incorrect thing isn't worth the supposed simplicity your method brings. For instance, a commit I have on my local branch adding a ASF source and WMV decoder is 126 lines long. Take a look: https://github.com/Guy1524/wine/commit/37748e69bb25f3bf97f4dbfebaa830e3eb282...
While I await your more complete response, I figure I might as well clarify some things.
I don't think that "doing the incorrect thing", i.e. failing to exactly emulate Windows, should necessarily be considered bad in itself, or at least not nearly as bad as all that.
My view, and my understanding of the Wine project's view in general as informed by its maintainers, is that emulating Windows is desirable for public documented behaviour (obviously), for undocumented behaviour that applications rely on (also obviously), for undocumented or semi-documented behaviour where there's no difference otherwise and where the native thing to do is obvious (e.g. the name of an internal registry key).
But there's not really a reason to emulate Windows otherwise. And in a case like this, where there's a significant benefit to not emulating Windows exactly, the only reason I see is "an application we don't know yet *might* depend on it". When faced with such a risk, I weigh the probability of that happening—and on the evidence of DirectShow applications, I see that as low—with the cost of having to change design—which also seems low to me; I can say from experience (c.f. 5de712b5d) that swapping out a specific demuxer for decodebin isn't very difficult.
Not to mention that what we're doing is barely "incorrect". Media Foundation is an API that's specifically meant to be extended in this way. For that matter, some application could easily register its own codec libraries on Windows with a higher priority than the native ones (this happened with DirectShow); that's essentially no different than what I'm suggesting.
I think the linked commit misses the point somewhat. That's partially because I don't think it makes sense to measure simplicity as an absolute metric simply using line count, and partially because it's missing the cost of adding other media types to the conversion functions (which is one of the reasons, though not the only reason, I thought to write this mail). But it's mostly because the cost of using decodebin, where it works, is essentially zero: we write one media source, and it works for everything; no extension for ASF required. If it never becomes necessary to write a source that outputs compressed samples, then we also don't have the cost of abstraction (which is always worth taking seriously!), and if it does, we come out even—we can still use your generic media source, or something like it.
Ultimately, I think that a decodebin wrapper is something we want to have anyway, for the sake of host codecs like Theora, and once we have it, I see zero cost in using it wherever else we can.