Re: [PATCH 1/2] winegstreamer: Add helper for GstCaps <-> IMFMediaType conversion.

27 Mar 2020


      On 3/26/20 6:07 PM, Derek Lesho wrote:
...
On 3/26/20 4:56 PM, Zebediah Figura wrote:
...
There's another broad question I have with this approach, actually,
which is fundamental enough I have to assume it's at had some thought
put into it, but it would be nice if that discussion happened in a more
public place, and was justified in the patches sent.
Essentially, the question is: what if we were to use decodebin directly?
As I understand (and admittedly Media Foundation is far more complex
than I could hope to understand) an application which just calls
IMFSourceResolver methods just needs to get back a working
IMFMediaSource, and we could wrap decodebin with one of those, similar
to the quartz wrapper.
First of all, this is something I think we want to do anyway. Microsoft
has no demuxer for, say, Vorbis (at least, there's not one registered on
my Windows 10 machine), but I think that we want to be able to play back
Vorbis files anyway (in, say, a Win32 media player application). Instead
of writing yet another source for vorbis, and for each other obscure
format, we just write one generic decodebin wrapper.
Second of all, the most obvious benefit, at least while looking at these
patches, is that you now don't need to write caps <-> IMFMediaType
conversion for every type on the planet. Another benefit is that you let
all of the decoding happen within a single GStreamer pipeline, which is
probably better for performance. You also can simplify your
postprocessing step to adding a single videoconvert and audioconvert,
instead of having to manually (or semi-manually) add e.g. an h264 parser
element. These are some of the benefits I had in mind when removing the
GStreamer quartz transforms.
Even in the case where the application manually creates e.g. an MPEG-4
source, my understanding is it's still the source's job to automatically
append transforms to match the requested type. We'd just be moving that
from the mfplat level to the gstreamer level—i.e. let decodebin select
the 'transforms' needed to convert to raw video and audio.
It obviously wouldn't match native structure, but it's not clear to me
that it would fail to match native in a way that would cause problems.
Judging from my experience with quartz, most applications aren't going
to care how their media is decoded as long as they get raw samples out
of it. Only a select few build the graph manually because they don't
realize that they can autoplug, or make assumptions about which filters
will be present once autoplugging is done, and some of those even fall
back to autoplugging if their preferred method fails. Maybe the
situation is different with mfplat, but given that there is a way to let
mfplat figure out which sources and transforms to use, I'm gonna be
really surprised if most applications aren't using it.
If you do come across an application that requires we mimic native's
specific arrangement of sources and transforms, it seems to me it
wouldn't require that much effort to swap a different parser in for
decodebin, and to implement the necessary bits in the media type
conversion functions. Ultimately I suspect it'd be less work to have a
decodebin wrapper + specific sources for applications that require them,
than to manually implement every source and transform.
I'll make a more complete response to this tomorrow, but I really think 
that doing the incorrect thing isn't worth the supposed simplicity your 
method brings.  For instance, a commit I have on my local branch adding 
a ASF source and WMV decoder is 126 lines long. Take a look: 
https://github.com/Guy1524/wine/commit/37748e69bb25f3bf97f4dbfebaa830e3eb282...
While I await your more complete response, I figure I might as well
clarify some things.
I don't think that "doing the incorrect thing", i.e. failing to exactly
emulate Windows, should necessarily be considered bad in itself, or at
least not nearly as bad as all that.
My view, and my understanding of the Wine project's view in general as
informed by its maintainers, is that emulating Windows is desirable for
public documented behaviour (obviously), for undocumented behaviour that
applications rely on (also obviously), for undocumented or
semi-documented behaviour where there's no difference otherwise and
where the native thing to do is obvious (e.g. the name of an internal
registry key).
But there's not really a reason to emulate Windows otherwise. And in a
case like this, where there's a significant benefit to not emulating
Windows exactly, the only reason I see is "an application we don't know
yet *might* depend on it". When faced with such a risk, I weigh the
probability of that happening—and on the evidence of DirectShow
applications, I see that as low—with the cost of having to change
design—which also seems low to me; I can say from experience (c.f.
5de712b5d) that swapping out a specific demuxer for decodebin isn't very
difficult.
Not to mention that what we're doing is barely "incorrect". Media
Foundation is an API that's specifically meant to be extended in this
way. For that matter, some application could easily register its own
codec libraries on Windows with a higher priority than the native ones
(this happened with DirectShow); that's essentially no different than
what I'm suggesting.
I think the linked commit misses the point somewhat. That's partially
because I don't think it makes sense to measure simplicity as an
absolute metric simply using line count, and partially because it's
missing the cost of adding other media types to the conversion functions
(which is one of the reasons, though not the only reason, I thought to
write this mail). But it's mostly because the cost of using decodebin,
where it works, is essentially zero: we write one media source, and it
works for everything; no extension for ASF required. If it never becomes
necessary to write a source that outputs compressed samples, then we
also don't have the cost of abstraction (which is always worth taking
seriously!), and if it does, we come out even—we can still use your
generic media source, or something like it.
Ultimately, I think that a decodebin wrapper is something we want to
have anyway, for the sake of host codecs like Theora, and once we have
it, I see zero cost in using it wherever else we can.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH 1/2] winegstreamer: Add helper for GstCaps <-> IMFMediaType conversion.