Re: [PATCH v3 0/7] MR61: winegstreamer: Dynamic transform stream format change support.

24 May 2022

      On Tue May 24 00:37:52 2022 +0000, **** wrote:
...
Zebediah Figura replied on the mailing list:
On 5/17/22 13:47, Rémi Bernon (@rbernon) wrote:
> Well it's actually pretty clear that it's not going to be possible to
> implement it unless you move most of wg_transform_read_data logic out
> of the unix side, to the PE-side, exposing additional entry points for
> each of the steps it goes through (which is what you are doing there).
> 
> I don't think it's a good idea, because:
> 
> 1) The entry points can actually be reduced to push_data / read_data
>     operations, which intuitively maps to the MF transforms or the DMO
>     ProcessInput / ProcessOutput methods.
But it doesn't map nicely to DirectShow, and it only works because 
ProcessOutput() is kind of badly designed.
To be clear, I don't object in principle to having an internal API that 
closely matches a badly designed external one, but that's not quite what 
we have here. Some aspects of the Media Foundation API are implemented 
on the PE side, and the way in which they're split up feels awkward and 
less than fully intuitive to me.
> 
> 2) It complicates the unixlib API by multiplying entry points, although
>     they are only going to work when called in a specific sequence. 
My proposal as given (and wg_parser as it exists, for that matter) is 
not great because get_buffer() kind of implies that it consumes a 
buffer, which it doesn't. But if you rename the first API to something 
like "wg_transform_get_output_format()" it becomes clearer. The point is 
that it doesn't actually affect the state and hence is well-defined at 
any point.
>     To
>     get zero-copy you'll have to add even more entry points, to provide
>     the output buffer beforehand, and release it after the processing.
I'm confused about this, because this doesn't match my (somewhat dim) 
awareness of what would be necessary for zero-copy. So maybe it would 
help to hash that out now completely.
My understanding is:
* in some cases we can and should transform data in-place (perhaps 
because the PE side expects it, not that I'm aware of any such cases). 
In that case we should feed a writable GstBuffer into the pipeline, and 
if the output buffer was not the same as the input buffer, blit the 
latter into the former.
* if we cannot transform data in-place (either because GStreamer doesn't 
support it or because it'd confuse the PE side, not that I'm aware of 
any such cases there either), we should try to let GStreamer elements 
fill buffers allocated by the PE side. In order to achieve this we 
should set up a GstBufferPool ahead of time containing a fixed number of 
buffers. (I think we can guarantee we won't run out?) If the output 
buffer comes from this pool, we can retrieve the corresponding input 
buffer (the original intent of struct wg_sample) and return that pointer 
to user space. If not, we should call gst_buffer_pool_acquire_buffer() 
ourselves and blit the output buffer into it.
In both cases, though, it doesn't make sense to force a specific output 
buffer unless it's the same as the input buffer, which means there's no 
need for an extra API. Is there some extra consideration I'm missing here?
The other consequence that falls out from this is that the act of 
pushing data will inherently cause data to be written; i.e. there is no 
concept of "peek" quite like pipes. On the other hand, we can still 
"peek" at the last output buffer (and, more importantly, its associated 
metadata) without actually dequeuing it (and hence without changing the 
wg_transform state at all), which is kind of what I was trying to 
propose in the first place. It's not obvious to me why this is 
problematic; am I missing something here as well?
> 
> 3) This specific call sequence (although some steps such as format
>     change checks are not always useful, but will probably still be
>     needed every time) will have to be duplicated in each transform
>     when it can be factored in a single place.
Sure. On the other hand, we already have (somewhat misleadingly named) 
helpers that do most of the sample post-processing already. Maybe I'm 
missing a reason that they couldn't be extended to cover the entirety of 
ProcessOutput()...?

On 5/17/22 13:47, Rémi Bernon (@rbernon) wrote:
...
Well it's actually pretty clear that it's not going to be possible to
implement it unless you move most of wg_transform_read_data logic out
of the unix side, to the PE-side, exposing additional entry points for
each of the steps it goes through (which is what you are doing there).
I don't think it's a good idea, because:

The entry points can actually be reduced to push_data / read_data
 operations, which intuitively maps to the MF transforms or the DMO
 ProcessInput / ProcessOutput methods.

But it doesn't map nicely to DirectShow, and it only works because 
ProcessOutput() is kind of badly designed.
To be clear, I don't object in principle to having an internal API that 
closely matches a badly designed external one, but that's not quite what 
we have here. Some aspects of the Media Foundation API are implemented 
on the PE side, and the way in which they're split up feels awkward and 
less than fully intuitive to me.
DMO have the same ProcessInput / ProcessOutput methods, and I expect
they work the same way as transforms.
You say that it's badly designed but you don't say how. I don't think
it's really that bad, it does the job and is well designed for user
controlled buffer allocation.
...
...

It complicates the unixlib API by multiplying entry points, although
 they are only going to work when called in a specific sequence.

My proposal as given (and wg_parser as it exists, for that matter) is 
not great because get_buffer() kind of implies that it consumes a 
buffer, which it doesn't. But if you rename the first API to something 
like "wg_transform_get_output_format()" it becomes clearer. The point is 
that it doesn't actually affect the state and hence is well-defined at 
any point.
Whatever that entry point would be named, it will affect the state
because it will have to actually push input buffers that were queued.
Otherwise you will never find any output or new format.
...
...
To
get zero-copy you'll have to add even more entry points, to provide
the output buffer beforehand, and release it after the processing.

I'm confused about this, because this doesn't match my (somewhat dim) 
awareness of what would be necessary for zero-copy. So maybe it would 
help to hash that out now completely.
My understanding is:

in some cases we can and should transform data in-place (perhaps

because the PE side expects it, not that I'm aware of any such cases). 
In that case we should feed a writable GstBuffer into the pipeline, and 
if the output buffer was not the same as the input buffer, blit the 
latter into the former.

if we cannot transform data in-place (either because GStreamer doesn't

support it or because it'd confuse the PE side, not that I'm aware of 
any such cases there either), we should try to let GStreamer elements 
fill buffers allocated by the PE side. In order to achieve this we 
should set up a GstBufferPool ahead of time containing a fixed number of 
buffers. (I think we can guarantee we won't run out?) If the output 
buffer comes from this pool, we can retrieve the corresponding input 
buffer (the original intent of struct wg_sample) and return that pointer 
to user space. If not, we should call gst_buffer_pool_acquire_buffer() 
ourselves and blit the output buffer into it.
In both cases, though, it doesn't make sense to force a specific output 
buffer unless it's the same as the input buffer, which means there's no 
need for an extra API. Is there some extra consideration I'm missing here?
The other consequence that falls out from this is that the act of 
pushing data will inherently cause data to be written; i.e. there is no 
concept of "peek" quite like pipes. On the other hand, we can still 
"peek" at the last output buffer (and, more importantly, its associated 
metadata) without actually dequeuing it (and hence without changing the 
wg_transform state at all), which is kind of what I was trying to 
propose in the first place. It's not obvious to me why this is 
problematic; am I missing something here as well?
There's no such thing as transforming in-place. Input buffers (as in
provided by the app to ProcessInput) and output buffers (as in provided
by the app to ProcessOutput) will never be the same. They just cannot
in general simply because of a difference of size, but also because it's
not how it's supposed to work: input buffers and kept referenced until
they have been fully processed, and until one or more output buffers
have been filled from it.
What zero-copy means however, is avoiding any unnecessary copy when
passing data back and forth GStreamer and unix-side in general.
The only way this can be done, for ProcessOutput, like I described
previously already, is by:
1) providing the application provided output buffer to GStreamer
   *before* processing any input, so that any future buffer allocation
   can use it to write the decoded data.
2) process the input buffers that were queued, checking for a format
   change event, or whether the output buffer has been used.
3) blit the output data to the output buffer if it wasn't picked up.
4) remove the output buffer from any GStreamer buffer thay may still be
   referencing it (This last step also possibly but hopefully doesn't
   involve copying the data back from the output buffer to GStreamer
   memory if the data isn't supposed to be discarded, this can happen
   in some case, at least when format change event is returned so we
   do not lose the data but not only).
-- 
https://gitlab.winehq.org/wine/wine/-/merge_requests/61#note_1138

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH v3 0/7] MR61: winegstreamer: Dynamic transform stream format change support.