On 3/11/21 3:33 PM, Derek Lesho wrote:
On 3/11/21 4:06 PM, Zebediah Figura (she/her) wrote:
The really correct solution to that deadlock is to create our own element (or, actually, use appsrc—now that winegstreamer is split into PE and Unix parts
Yeah, I still believe GstAppSrc is the cleaner solution, as the interface is quite simple and well documented. But functionally, the manual pad mode should work just as well, it's up to you.
I don't remember what discussion we had about appsrc in the first place (if we had any?), but I think my argument was that it wasn't really a *better* fit than using getrange callbacks directly, i.e. it wasn't saving us much work (at best it would have saved us the need to separately handle push mode, but it was better to simply not implement push mode in the media source).
But that's less true now, given the way that read requests are handled at the Unix library boundary, and the combined backend interface for DirectShow and Media Foundation, and I think appsrc may be a better fit now. The "need-data"/"push-data" model looks like a better fit, that might allow us to avoid the current awkward callback marshalling, and we'd be able to get rid of push-mode handling completely.
, that option is starting to look more attractive, especially if we want to make winegstreamer compatible with WoW64 emulation.)
I'm not sure I see how this is relevant to the WoW64 emulation, if you don't mind, could you explain this?
One reason that appsrc is undesirable (which I probably didn't mention earlier) is that it always takes buffers allocated by the "app", as it were, rather than by the element calling gst_pad_get_range(). That's fine if the downstream element didn't allocate its own buffer, but if it did (and oggdemux is one major example of an element that does) appsrc will do an extra blit.
That should be avoided if possible. I'm not particularly concerned about latency—I don't think it should be a bottleneck, as long as the machine has enough cores—but I am more concerned about CPU usage, and cases where we might be CPU limited and *don't* have enough cores. Yes, it's probably premature optimization, but I'm a lot warier about removing an optimization already present than I am about introducing a new one.
Currently we fill read buffers allocated by the Unix side—either already allocated by a GStreamer element, or allocated by us using gst_buffer_new_and_alloc(). That's not compatible with WoW64 emulation (aka 32-on-64-bit support), as it requires the 32-bit PE side to write into a 64-bit pointer. Instead we would need to write into a buffer allocated from the 32-bit side, and pass that to the Unix side. But this also means that we have to do a blit.
[Note that we can in theory avoid the blit by wrapping the caller-allocated buffer in a custom GstBuffer class and passing that down to the Unix side. We still have to do a blit if reading into a downstream-allocated GstBuffer, but we can at least avoid it otherwise.]
Thanks for the comprehensive explanation; so for it's worth, the code I intended to submit as part of getting the decoder MFT to work contains some questionable code to generally preserve the current API where the PE side gets the buffer from the Unix side, with the pushing thread increasing the size of its buffer allocations to match the size of the buffer being pushed from the PE side. It also contained a fairly large amount of code handling flushes and EOS, which is most likely more error-prone than the GstAppSrc equivalent.
If we intend to move to GstAppSrc, then there's no point in me trying to upstream this questionable code whose only reason for existing is to preserve a model which can't last anyway. Would you like me to take a shot at moving wg_parser to GstAppSrc?
I think that new wg_* objects should definitely use appsrc.
I'm mostly inclined to say that the parser should as well; I'm still a little concerned about potential performance regression (and I'm not sure to what degree WoW64 emulation will end up actually being implemented in upstream Wine) but I'm leaning towards it being a good idea.
That shouldn't actually block work on the decoder, but it may help give a more coherent picture of the unixlib interface. In any case I would definitely appreciate your work on that conversion.
I think we should probably leave off the aforementioned buffer wrapping optimization, at least for an initial pass, and then evaluate the actual cost of blitting. It may turn out to be negligible.