On 5/24/22 14:14, RĂ©mi Bernon (@rbernon) wrote:
I'm looking at MFT_INPUT_STREAM_PROCESSES_IN_PLACE. I haven't checked whether anything interesting exposes it, though; maybe nothing does?
(I don't think any of the GStreamer elements we care about are capable of in-place transform either, but theoretically some of them could be.)
As far as the tests go, no transform is currently exposing this (and we start to have quite a few in the tests).
There's also the MFT_OUTPUT_STREAM_(CAN_)PROVIDES_SAMPLES flags that could have been useful, but there too, it doesn't seem to be used by native transforms, and I think games aren't expecting to see it.
Thus the implementation has all been around the worst case, which is when output buffer is app provided and cannot be swapped behind the scenes.
This is ugly to deal with. I guess you can provide a specific buffer by (ab)using GstBufferPool's acquire_buffer() method to block until ProcessOutput() is called and we have a buffer; I don't know if you had a better plan than that.
A bit like this, but it's actually more tricky because some decoders aren't playing ball.
They are happy to allocate buffers from a downstream pool, but they won't release the buffers so easily. I understand that they keep the buffers for previous frame reference in their decoding process, and that needs to outlive the app provided output buffer.
Because they hold a ref on the buffer, it isn't writable and we cannot detach its memory either.
On the other hand, it's not clear to me how we're going to deal with MF_E_TRANSFORM_STREAM_CHANGE or MFT_OUTPUT_STATUS_SAMPLE_READY while maintaining zero-copy. What is your plan for those?
So my plan, which will also cover the case above, is to use instead a custom allocator (going to be generally useful imho if we want zero-copy in wg_parser too), which will allocate memory normally with an optional app provided memory overlay.
When allocation needs to be done without an app provided buffer, the allocator will work as the default allocator, allocate unix side memory, and return a pointer to it on map. This is for instance going to happen if more than one buffer gets decoded. In that case, on the next ProcessOutput call, the data will have to be copied.
When we have an app provided buffer, and in the good cases, it would instead overlayed the unix memory with it and return the app memory on map so the decoder writes directly to it. When the app buffer needs to be released, we remove the overlay and the unix memory will later be used if there's any re-use.
In the bad cases, when either the output buffer is still referenced by the decoder, or when we actually should keep the output sample pending, and when we have incorrectly overlayed app buffer, we will copy back the data to the unix memory before detaching the overlay. In any case we still get only one buffer copy, so not worst than the normal copy.
In practice the bad cases shouldn't happen too often: when decoder hold on the buffers, we cannot remove memory from them. This also means the pool will not discard the buffers on release, and after a few allocs the decoder will have enough buffers to use on them and the allocator will not receive any more allocation requests. We will have to copy the data anyway, but in the right direction this time.
In the good cases, releasing buffer memory will make sure to taint the buffers and the pool will free them on release. The allocator will receive a request every time a buffer is needed.
Thank you for the context; the initial design decisions behind this patch make a lot more sense now.
Once you take it as a given that we have to explicitly specify the output buffer per sample even in the zero-copy case, it indeed doesn't seem plausible to try to peek without dequeuing. In that case I'm inclined to agree that the best solution is a monolithic wg_transform_read_data() that returns MF_E_FORMAT_CHANGED_OR_WHATEVER.
Based on IRC conversation I get the impression we can plausibly track it all on the Unix side after all, possibly without even needing an extra API to invalidate the format? That would feel the least awkward to me. We'd still return the new format from the Unix side as part of wg_transform_read_data(), but it'd be output-only.
(I'm starting to suspect we should reinstate the wg_stream_event union... I'll have to examine that possibility after this is upstream.)
Sorry for all the runaround (and the slowness of review). This is an ugly API mismatch we have to deal with and I have found it quite difficult to grasp all of the many considerations.