Ambiguity? What does that mean?
It means that we have no control over what `videoconvert | videoflip | videoconvert` will actually do, and we can only hope that it will do something sensible and, for instance, not decide to color convert unnecessarily or fail to pass our pool through when it could.
That seems unnecessarily defeatist...? The behaviour of those elements is documented. If they're failing to respect parts of the GStreamer API, that's a bug in the element and it should be fixed there. If they're doing something less performantly than they should, that's also something that should be fixed in the element.
If videoflip is copying when it should be passthrough then that should be fixed upstream. I don't want to introduce a bunch of extra code just to work around a GStreamer bug, especially if it's already fixed upstream (which is not clear from this commit message).
Sure, but we cannot fix older GStreamer versions, so lets also fix it on our side the right way, which is to provide the correct stride information on our buffers, and reduce the complexity of our pipelines, which will help reducing the risk and ease the debugging.
Why is this any more the "right" way than using videoflip?
I also don't understand the bit about complexity. From a somewhat abstract level, code gets more complex and harder to work with when you add multiple *different* interacting components. Having more of the *same* component—in this case, adding more beads to a string of postprocessing elements—doesn't make anything harder to work with.
I still am failing to understand this at all, sorry. Why is the format that's set on the transform not the format we store?
Like the test shows, the input and output media types don't have to match their frame size exactly or can include or omit frame padding freely. This ends up with buffers being passed through with padding, with padding added or cropped accordingly.
However, GStreamer is unable to convey the buffer padding information in its caps and it is a buffer property only. When it matches caps, both input and output frame sizes have to match exactly.
That part all makes sense, and it seems sensible to me that Media Foundation would let you arbitrarily add or remove padding from a frame.
If the client called the video processor SetInputType with a frame size of 96x96 and an aperture to 82x84, and SetOutputType with a frame size of 96x96 but without any aperture, we will try to create input/output formats with two different frame size, which would fail the caps negociation.
We need to consider in this case that the client wanted a 82x84 output with 14x12 padding. Other combinations need to be handled as well, so this takes the smaller frame size on input and output and consider any extra to be included in buffer padding.
This is the surprising part—viz. that the actual content size can fail to match. It's not what the code was doing before, but it's not described in the patch subject or a comment either. And it seems like it should be unrelated to the main purpose of *either* patch 8 (stop including padding in the width/height) or patch 9 (which still seems like it should just be a matter of putting meta on the input buffers, and I'm confused why we're not doing that.)
And I must be missing something, because I don't see this situation tested in test_video_processor()? I only see tests where both the input and output have the same aperture.