On Wed Apr 5 18:25:33 2023 +0000, Paul Gofman wrote:
I considered something between the lines, but there are many problems with this (due to which I considered that not doable):
- we don't want to transfer from CPU to GPU also if wg_transform didn't
really return any samples (which is the case in at least half of invocations when it needs more input data). That is, if we the sample is not going to be returned we don't want to lock dxgi buffer at all, as there is no way to tell it not to perform any texture update at all. But we still need to pass some buffer as we don't know upfront if the data will be used or not;
- I don't know if anything really depends on that, but as far as my
additional testing went native doesn't allocate a sample if it is not going to return one (so not trying to allocate a sample which may fail if the app is referencing all the samples). The way I am testing that is trying to get more samples while not freeing any (native will hang in ProcessOutput once out of free samples while we will return an error).
- the buffer returned by Lock2DSize is not suitable to pass to
wg_transform, it may have different stride (stipulated by 3D implementation). It may even match now by chance but that is not guaranteed (e. g., for the test example pitch for NV12 currently matches on Wine but is 256 on Windows, so that may change). So there should be a separate buffer anyway (that's what general MF buffer Lock does bu allocating a separate "linear" buffer on map. Yes, that can be addressed somehow on gstreamer side, but in the view of the above we probably need a memory buffer anyway. So the only technically straightforward approach that I see here is to make Unix part wg_transform handle that somehow. But that is also quite not straightforward in practice as in principle requires some callbacks from the Unix part to deal with mapping the sample once needed (which is apparently very uncovinient, we try to avoid callbacks from the Unix side). Do you see any better way?
In other words, from the above it looks like we need a temporary memory buffer anyway. And then we need to transfer that to dxgi sample only once needed. Maybe I should just move the sample copy helper out of h264 decoder to wg_sample.c?