On 11/6/20 3:55 PM, Zebediah Figura wrote:
On 11/6/20 3:29 PM, Derek Lesho wrote:
On 11/6/20 3:10 PM, Zebediah Figura wrote:
On 11/6/20 2:51 PM, Derek Lesho wrote:
On 11/6/20 2:32 PM, Zebediah Figura wrote:
On 11/6/20 2:20 PM, Derek Lesho wrote:
On 11/6/20 2:13 PM, Zebediah Figura wrote:
> This is done in a rather inconsistent way relative to how video > streams > are handled. Yes, because the goals are different for each of the paths. The video path is just an enhancement to report video formats in a defined order as if they were coming from a decoder, since right now we're skipping the decoder MFT step. The step for fixing up the audio caps is meant to be a generic solution for any caps which are un-representable as a IMFMediaType object. This same path is used for compressed h.264 video on my local branch for example.
In both cases you're doing conversion from a type which may not be representable into a type which is.
No, in the case of the uncompressed video streams, the type is almost definitely re-presentable. Think of it as a necessary hack for the bypassing of the decoder MFT we are doing. On the other hand, there are plenty of cases where uncompressed audio may be read from a container, and the fixup path would still be necessary in those cases.
You can't assume that the type is representable.
After applying videoconvert to make it output the most common decoder output types, I think I can.
In fact, you should make no assumptions about the type whatsoever. This is not only true in theory, but in practice—I've seen decoders output GST_VIDEO_FORMAT_RGB.
Even if we were outputting the video type decodebin directly feeds us and it happened to be RGB, it wouldn't matter, as media foundation supports RGB types: https://docs.microsoft.com/en-us/windows/win32/medfound/video-subtype-guids#...
Media Foundation supports *BGR* types, to use GStreamer's terminology. It does not support GST_VIDEO_FORMAT_RGB, which is identical to e.g. WINED3DFMT_B8G8R8_UNORM with swapped R and B channels.
Ah okay, well then if we every run up against that, we can add a conversion to make_mf_compatible_caps.
The reasons for doing this conversion may be different, but there is no reason for the mechanism to be.
I would say there is, the conversion we're doing for the video streams is unconditional, entirely specific to the media source output, and doesn't output 1 fixed up caps structure per input caps structure. On the other hand, the audio and compressed type format would be necessary any time we want to feed gstreamer buffers with those caps to a media foundation component, and is a 1 to 1 conversion in every case.
I don't see any of those as reasons for the code structure to be different.
The make_mf_compatible_caps code is generic enough that we very likely may use it in other areas in the future, while the uncompressed video format conversion is a specific hack of the media-container media source. I gave potential examples of this in my previous email.
You may need to be more specific, then, because I don't see how any of those examples would use the function in a new way,
The main alternate use-case would be any alternate media source we end up adding, although I suppose this could also apply to a MFT as well if we wanted to use it there. In *any* possible implementation of IMFMediaSource we provide, we'll need raw-audio output to be representable by IMFMediaType. Only in the current media source, whose job it is to take container files, and (at-least in the currently upstream version) output uncompressed streams, will we want to generate multiple media types from one.
or not have an analogous situation with video.
I'm not saying that any situation doesn't have an analogous situation with video.
The semantics for a function which aligns one gstreamer caps structure with another caps structure compatible with IMFMediaType, don't fit the semantics of a function outputting multiple caps to provide compatibility for hardcoded expectations of applications using the source reader.
Note thought that even if they were, you probably want to make it possible to convert even from some representable formats. Not all systems can play back 64-bit float PCM, for example.
That code doesn't belong in winegstreamer. That problem is solved by the streaming audio renderer not supporting an IMFMediaType with 64-bit float PCM, and the topology loader resolving that with the audio conversion MFT. This kind of problem is already solved in my complete media foundation branch present on staging.
I don't see any reason that the code shouldn't belong in winegstreamer. In fact, it's better to put it there, not only because then we don't have to write such a transform,
That's not a problem, I've already written such a transform, it's mostly boilerplate anyway.
but also because transforming entirely within the GStreamer pipeline will be more efficient in several ways.
I'm not sure we've seen any evidence of that. Maybe provide benchmarks? Either way, due to some details with the topology loader, we can't just rely on the media source providing every possible output type under the sun and hooking it up. Remember, on windows, the media source usually only outputs one type, which is directly extracted from the file. Due to this, Microsoft has an attribute used during topology resolution, MF_TOPOLOGY_ENUMERATE_SOURCE_TYPES, which UE4 does use. Because of this, any conversion to the desired output type must happen through MFTs.
Examples of other areas where this would be necessary, off the top of my head, would be a separate uncompressed-audio-emitting media source from, say, a microphone, or any MFT which outputs compressed video or raw audio, such an encoder MFT.
Moreover, the goals are not entirely orthogonal; not all video will be output in the four types you have listed.
All video streams that take the videoconvert enumeration path (uncompressed video) won't need any transformation to align with an IMFMediaType object. The only potential incompatibility would be the layout, but that problem would never surface with the current media source we are pretending that our output types have gone through a stand media foundation decoder. An instance where we would want to put this type of code in the make_mf_compatible_caps path would be a media source that provides uncompressed video on windows, such as from a webcam or screen capture. In the code for this media source, we'd unconditionally put the video caps through the make_mf_compatible_caps path, and add code there to replace any unsupported layout with its closest equivalent defined in media foundation.
"won't need any transformation" is only true because you're *already* applying a transformation.
Yes, a transformation unrelated to resolving incompatibilities between GstCaps and IMFMediaType.
This code path *is* the transformation. If you did a similar thing with the audio stream, it "won't need any transformation" either.
Yes but I'm not aware of any requirements applications have on the audio stream outputs of decoders yet, so we don't have to. This is very fortunate, as if that were the case, the code could get ugly differentiating between audio streams that are raw because they've been decoded, and audio streams that are raw because they were already raw in the container.
I don't think you should need any such code.
The fundamental point I'm trying to get across is that even if *a* *reason* for doing a transformation differs, the transformation itself is very similar,
I wouldn't say that. The make_mf_compatible_caps transformation operates on caps, whereas the "advertise standard decoder types" hack just modifies which IMFMediaTypes we advertise, and doesn't interact with the Gstreamer caps at all. A function which accounted for this path wouldn't share the same parameters or outputs. Just to reiterate, make_mf_compatible_caps takes on GstCaps, and returns another GstCaps, or NULL. The standard decoder type advertisement hack takes one IMFMediaType, and unconditionally outputs an ordered list of 5 clones of said IMFMediaType, each with the desired MF_MT_SUBTYPE.
and there's no point in doing it in two different ways. Look at the similarities between the code paths, not their differences. You'll find that you don't actually have to account for the differences at all in the code structure—only in the comments that explain why you're doing what you're doing.