There's another broad question I have with this approach, actually, which is fundamental enough I have to assume it's at had some thought put into it, but it would be nice if that discussion happened in a more public place, and was justified in the patches sent.
Essentially, the question is: what if we were to use decodebin directly?
As I understand (and admittedly Media Foundation is far more complex than I could hope to understand) an application which just calls IMFSourceResolver methods just needs to get back a working IMFMediaSource, and we could wrap decodebin with one of those, similar to the quartz wrapper.
First of all, this is something I think we want to do anyway. Microsoft has no demuxer for, say, Vorbis (at least, there's not one registered on my Windows 10 machine), but I think that we want to be able to play back Vorbis files anyway (in, say, a Win32 media player application). Instead of writing yet another source for vorbis, and for each other obscure format, we just write one generic decodebin wrapper.
Second of all, the most obvious benefit, at least while looking at these patches, is that you now don't need to write caps <-> IMFMediaType conversion for every type on the planet. Another benefit is that you let all of the decoding happen within a single GStreamer pipeline, which is probably better for performance. You also can simplify your postprocessing step to adding a single videoconvert and audioconvert, instead of having to manually (or semi-manually) add e.g. an h264 parser element. These are some of the benefits I had in mind when removing the GStreamer quartz transforms.
Even in the case where the application manually creates e.g. an MPEG-4 source, my understanding is it's still the source's job to automatically append transforms to match the requested type. We'd just be moving that from the mfplat level to the gstreamer level—i.e. let decodebin select the 'transforms' needed to convert to raw video and audio.
It obviously wouldn't match native structure, but it's not clear to me that it would fail to match native in a way that would cause problems. Judging from my experience with quartz, most applications aren't going to care how their media is decoded as long as they get raw samples out of it. Only a select few build the graph manually because they don't realize that they can autoplug, or make assumptions about which filters will be present once autoplugging is done, and some of those even fall back to autoplugging if their preferred method fails. Maybe the situation is different with mfplat, but given that there is a way to let mfplat figure out which sources and transforms to use, I'm gonna be really surprised if most applications aren't using it.
If you do come across an application that requires we mimic native's specific arrangement of sources and transforms, it seems to me it wouldn't require that much effort to swap a different parser in for decodebin, and to implement the necessary bits in the media type conversion functions. Ultimately I suspect it'd be less work to have a decodebin wrapper + specific sources for applications that require them, than to manually implement every source and transform.