There's another broad question I have with this approach, actually,
which is fundamental enough I have to assume it's at had some thought
put into it, but it would be nice if that discussion happened in a more
public place, and was justified in the patches sent.
Essentially, the question is: what if we were to use decodebin directly?
As I understand (and admittedly Media Foundation is far more complex
than I could hope to understand) an application which just calls
IMFSourceResolver methods just needs to get back a working
IMFMediaSource, and we could wrap decodebin with one of those, similar
to the quartz wrapper.
First of all, this is something I think we want to do anyway. Microsoft
has no demuxer for, say, Vorbis (at least, there's not one registered on
my Windows 10 machine), but I think that we want to be able to play back
Vorbis files anyway (in, say, a Win32 media player application). Instead
of writing yet another source for vorbis, and for each other obscure
format, we just write one generic decodebin wrapper.
Second of all, the most obvious benefit, at least while looking at these
patches, is that you now don't need to write caps <-> IMFMediaType
conversion for every type on the planet. Another benefit is that you let
all of the decoding happen within a single GStreamer pipeline, which is
probably better for performance. You also can simplify your
postprocessing step to adding a single videoconvert and audioconvert,
instead of having to manually (or semi-manually) add e.g. an h264 parser
element. These are some of the benefits I had in mind when removing the
GStreamer quartz transforms.
Even in the case where the application manually creates e.g. an MPEG-4
source, my understanding is it's still the source's job to automatically
append transforms to match the requested type. We'd just be moving that
from the mfplat level to the gstreamer level—i.e. let decodebin select
the 'transforms' needed to convert to raw video and audio.
It obviously wouldn't match native structure, but it's not clear to me
that it would fail to match native in a way that would cause problems.
Judging from my experience with quartz, most applications aren't going
to care how their media is decoded as long as they get raw samples out
of it. Only a select few build the graph manually because they don't
realize that they can autoplug, or make assumptions about which filters
will be present once autoplugging is done, and some of those even fall
back to autoplugging if their preferred method fails. Maybe the
situation is different with mfplat, but given that there is a way to let
mfplat figure out which sources and transforms to use, I'm gonna be
really surprised if most applications aren't using it.
If you do come across an application that requires we mimic native's
specific arrangement of sources and transforms, it seems to me it
wouldn't require that much effort to swap a different parser in for
decodebin, and to implement the necessary bits in the media type
conversion functions. Ultimately I suspect it'd be less work to have a
decodebin wrapper + specific sources for applications that require them,
than to manually implement every source and transform.