That seems unnecessarily defeatist...? The behaviour of those elements is documented. If they're failing to respect parts of the GStreamer API, that's a bug in the element and it should be fixed there. If they're doing something less performantly than they should, that's also something that should be fixed in the element.
This has nothing to do with documentation, and the behavior of components when combined together isn't documented anyway. They can behave as they will, and all depends on the caps negotiation that happens between them and depends on their capabilities. The videoflip elements has specific video format requirements, and can very well end up causing suboptimal negotiation in the pipeline.
Why is this any more the "right" way than using videoflip?
We are not doing any kind of frame flipping, but instead we are implementing buffers with negative strides. Providing stride information to GStreamer is the right way to tell about buffer stride. Using a videoflip element is an equivalent but convoluted way to do it.
I also don't understand the bit about complexity. From a somewhat abstract level, code gets more complex and harder to work with when you add multiple *different* interacting components. Having more of the *same* component—in this case, adding more beads to a string of postprocessing elements—doesn't make anything harder to work with.
Of course it does, it increases the number of possible failures. Doesn't matter of the components are the same, the more you add the more complex it gets. And the worst part is that it's not components we have the source directly available, it's GStreamer components which are most often pre-built from the system distribution.
Take debugging of the current video processor pipeline for instance, you have three elements when we could have only one. The two videoconvert and videoflip are talking to each other back and forth to negotiate their caps. Decyphering the GStreamer trace to understand what is actually going on, and figure out in the end that somewhere in the middle of all these verbose traces, videoflip has decided to drop the pool it was provided by downstream element to use its own, *is* way more complicated than it could be.
And I must be missing something, because I don't see this situation tested in test_video_processor()? I only see tests where both the input and output have the same aperture.
There's one test which is fixed with this MR (well now that I've split the last patch, there's another one which is broken then fixed), and it is about using an aperture on input and no aperture on output. It fails before this MR, it passes after.