This is the first batch of a series implementing faster media source resolution required to workaround an Unreal Engine race condition present in some games, and deterministic stream ordering that decodebin / parsebin cannot provide, which is required to expose the streams in native order, for compatibility in several other applications.
I pushed the full series as a branch here: https://gitlab.winehq.org/rbernon/wine/-/commits/mr/wg-source-part-one
Note that this full series is also a first step in the direction of having a simpler demuxer interface, which will be required in the future for compatibility with applications that build MF or DirectShow pipelines directly and expect the relevant components to behave as a demuxer and expose compressed media types. For now it only delays the use of wg_parser to whenever the media source is started, and matches the non-ordered streams using their media types and tags. This is a best effort solution but I don't think we can do much better for the moment.
From: Rémi Bernon rbernon@codeweavers.com
--- MAINTAINERS | 1 + dlls/winegstreamer/Makefile.in | 1 + dlls/winegstreamer/gst_private.h | 3 + dlls/winegstreamer/main.c | 25 ++++++++ dlls/winegstreamer/media_source.c | 15 ++++- dlls/winegstreamer/unix_private.h | 5 ++ dlls/winegstreamer/unixlib.c | 1 + dlls/winegstreamer/unixlib.h | 9 +++ dlls/winegstreamer/wg_parser.c | 6 ++ dlls/winegstreamer/wg_source.c | 94 +++++++++++++++++++++++++++++++ 10 files changed, 159 insertions(+), 1 deletion(-) create mode 100644 dlls/winegstreamer/wg_source.c
diff --git a/MAINTAINERS b/MAINTAINERS index dc5c85cc4ec..2fd7340efb0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -223,6 +223,7 @@ F: dlls/winegstreamer/mfplat.c F: dlls/winegstreamer/resampler.c F: dlls/winegstreamer/video_decoder.c F: dlls/winegstreamer/video_processor.c +F: dlls/winegstreamer/wg_source.c F: dlls/winegstreamer/wg_sample.c F: dlls/winegstreamer/wg_transform.c F: dlls/winegstreamer/wma_decoder.c diff --git a/dlls/winegstreamer/Makefile.in b/dlls/winegstreamer/Makefile.in index 78bdd6c0ef1..dbf0be035f1 100644 --- a/dlls/winegstreamer/Makefile.in +++ b/dlls/winegstreamer/Makefile.in @@ -24,6 +24,7 @@ C_SRCS = \ wg_format.c \ wg_parser.c \ wg_sample.c \ + wg_source.c \ wg_transform.c \ wm_reader.c \ wma_decoder.c \ diff --git a/dlls/winegstreamer/gst_private.h b/dlls/winegstreamer/gst_private.h index ed867f741d9..fd33ac42c19 100644 --- a/dlls/winegstreamer/gst_private.h +++ b/dlls/winegstreamer/gst_private.h @@ -110,6 +110,9 @@ bool wg_transform_get_status(wg_transform_t transform, bool *accepts_input); HRESULT wg_transform_drain(wg_transform_t transform); HRESULT wg_transform_flush(wg_transform_t transform);
+HRESULT wg_source_create(wg_source_t *out); +void wg_source_destroy(wg_source_t source); + unsigned int wg_format_get_max_size(const struct wg_format *format);
HRESULT avi_splitter_create(IUnknown *outer, IUnknown **out); diff --git a/dlls/winegstreamer/main.c b/dlls/winegstreamer/main.c index 9624c469314..8237936891c 100644 --- a/dlls/winegstreamer/main.c +++ b/dlls/winegstreamer/main.c @@ -456,6 +456,31 @@ HRESULT wg_transform_flush(wg_transform_t transform) return S_OK; }
+HRESULT wg_source_create(wg_source_t *out) +{ + struct wg_source_create_params params = {0}; + NTSTATUS status; + + TRACE("out %p\n", out); + + if ((status = WINE_UNIX_CALL(unix_wg_source_create, ¶ms))) + WARN("wg_source_create returned status %#lx\n", status); + else + { + TRACE("Returning source %#I64x.\n", params.source); + *out = params.source; + } + + return HRESULT_FROM_NT(status); +} + +void wg_source_destroy(wg_source_t source) +{ + TRACE("source %#I64x.\n", source); + + WINE_UNIX_CALL(unix_wg_source_destroy, &source); +} + #define ALIGN(n, alignment) (((n) + (alignment) - 1) & ~((alignment) - 1))
unsigned int wg_format_get_stride(const struct wg_format *format) diff --git a/dlls/winegstreamer/media_source.c b/dlls/winegstreamer/media_source.c index 8b9d42ea3f0..c6403661c0f 100644 --- a/dlls/winegstreamer/media_source.c +++ b/dlls/winegstreamer/media_source.c @@ -36,6 +36,8 @@ struct object_context IMFByteStream *stream; UINT64 file_size; WCHAR *url; + + wg_source_t wg_source; };
static struct object_context *impl_from_IUnknown(IUnknown *iface) @@ -78,6 +80,8 @@ static ULONG WINAPI object_context_Release(IUnknown *iface)
if (!refcount) { + if (context->wg_source) + wg_source_destroy(context->wg_source); IMFAsyncResult_Release(context->result); IMFByteStream_Release(context->stream); free(context->url); @@ -182,6 +186,7 @@ struct media_source
CRITICAL_SECTION cs;
+ wg_source_t wg_source; UINT64 file_size; wg_parser_t wg_parser; UINT64 duration; @@ -1359,6 +1364,7 @@ static ULONG WINAPI media_source_Release(IMFMediaSource *iface) IMFMediaSource_Shutdown(iface); IMFMediaEventQueue_Release(source->event_queue); IMFByteStream_Release(source->byte_stream); + wg_source_destroy(source->wg_source); wg_parser_destroy(source->wg_parser); source->cs.DebugInfo->Spare[0] = 0; DeleteCriticalSection(&source->cs); @@ -1637,6 +1643,9 @@ static HRESULT media_source_create(struct object_context *context, IMFMediaSourc InitializeCriticalSection(&object->cs); object->cs.DebugInfo->Spare[0] = (DWORD_PTR)(__FILE__ ": cs");
+ object->wg_source = context->wg_source; + context->wg_source = 0; + if (FAILED(hr = MFCreateEventQueue(&object->event_queue))) goto fail;
@@ -1716,6 +1725,8 @@ fail: WaitForSingleObject(object->read_thread, INFINITE); CloseHandle(object->read_thread); } + if (object->wg_source) + wg_source_destroy(object->wg_source); if (object->wg_parser) wg_parser_destroy(object->wg_parser); if (object->async_commands_queue) @@ -1998,7 +2009,9 @@ static HRESULT WINAPI stream_handler_callback_Invoke(IMFAsyncCallback *iface, IM if (!state || !(context = impl_from_IUnknown(state))) return E_INVALIDARG;
- if (FAILED(hr = media_source_create(context, (IMFMediaSource **)&object))) + if (FAILED(hr = wg_source_create(&context->wg_source))) + WARN("Failed to create wg_source, hr %#lx\n", hr); + else if (FAILED(hr = media_source_create(context, (IMFMediaSource **)&object))) WARN("Failed to create media source, hr %#lx\n", hr); else { diff --git a/dlls/winegstreamer/unix_private.h b/dlls/winegstreamer/unix_private.h index db6bf3ebb2e..8574cac797f 100644 --- a/dlls/winegstreamer/unix_private.h +++ b/dlls/winegstreamer/unix_private.h @@ -57,6 +57,11 @@ extern NTSTATUS wg_transform_get_status(void *args) DECLSPEC_HIDDEN; extern NTSTATUS wg_transform_drain(void *args) DECLSPEC_HIDDEN; extern NTSTATUS wg_transform_flush(void *args) DECLSPEC_HIDDEN;
+/* wg_source.c */ + +extern NTSTATUS wg_source_create(void *args) DECLSPEC_HIDDEN; +extern NTSTATUS wg_source_destroy(void *args) DECLSPEC_HIDDEN; + /* wg_allocator.c */
static inline BYTE *wg_sample_data(struct wg_sample *sample) diff --git a/dlls/winegstreamer/unixlib.c b/dlls/winegstreamer/unixlib.c index 6ed38260536..cb28b76b65e 100644 --- a/dlls/winegstreamer/unixlib.c +++ b/dlls/winegstreamer/unixlib.c @@ -31,6 +31,7 @@ #include <gst/gst.h> #include <gst/video/video.h> #include <gst/audio/audio.h> +#include <gst/base/base.h> #include <gst/tag/tag.h>
#include "ntstatus.h" diff --git a/dlls/winegstreamer/unixlib.h b/dlls/winegstreamer/unixlib.h index 702bd7aa69b..7fb177c4ed5 100644 --- a/dlls/winegstreamer/unixlib.h +++ b/dlls/winegstreamer/unixlib.h @@ -207,6 +207,7 @@ enum wg_parser_type typedef UINT64 wg_parser_t; typedef UINT64 wg_parser_stream_t; typedef UINT64 wg_transform_t; +typedef UINT64 wg_source_t;
struct wg_parser_create_params { @@ -363,6 +364,11 @@ struct wg_transform_get_status_params UINT32 accepts_input; };
+struct wg_source_create_params +{ + wg_source_t source; +}; + enum unix_funcs { unix_wg_init_gstreamer, @@ -402,6 +408,9 @@ enum unix_funcs unix_wg_transform_get_status, unix_wg_transform_drain, unix_wg_transform_flush, + + unix_wg_source_create, + unix_wg_source_destroy, };
#endif /* __WINE_WINEGSTREAMER_UNIXLIB_H */ diff --git a/dlls/winegstreamer/wg_parser.c b/dlls/winegstreamer/wg_parser.c index 298c05c6b88..387871d116a 100644 --- a/dlls/winegstreamer/wg_parser.c +++ b/dlls/winegstreamer/wg_parser.c @@ -1945,6 +1945,9 @@ const unixlib_entry_t __wine_unix_call_funcs[] = X(wg_transform_get_status), X(wg_transform_drain), X(wg_transform_flush), + + X(wg_source_create), + X(wg_source_destroy), };
#ifdef _WIN64 @@ -2192,6 +2195,9 @@ const unixlib_entry_t __wine_unix_call_wow64_funcs[] = X(wg_transform_get_status), X(wg_transform_drain), X(wg_transform_flush), + + X(wg_source_create), + X(wg_source_destroy), };
#endif /* _WIN64 */ diff --git a/dlls/winegstreamer/wg_source.c b/dlls/winegstreamer/wg_source.c new file mode 100644 index 00000000000..52a54196ea9 --- /dev/null +++ b/dlls/winegstreamer/wg_source.c @@ -0,0 +1,94 @@ +/* + * Copyright 2023 Rémi Bernon for CodeWeavers + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +#if 0 +#pragma makedep unix +#endif + +#include "config.h" + +#include <assert.h> +#include <stdarg.h> +#include <stdio.h> + +#include <gst/gst.h> +#include <gst/video/video.h> +#include <gst/audio/audio.h> +#include <gst/base/base.h> +#include <gst/tag/tag.h> + +#include "ntstatus.h" +#define WIN32_NO_STATUS +#include "winternl.h" +#include "mferror.h" + +#include "unix_private.h" + +struct wg_source +{ + GstElement *container; +}; + +static struct wg_source *get_source(wg_source_t source) +{ + return (struct wg_source *)(ULONG_PTR)source; +} + +NTSTATUS wg_source_create(void *args) +{ + struct wg_source_create_params *params = args; + struct wg_source *source; + + if (!(source = calloc(1, sizeof(*source)))) + return STATUS_UNSUCCESSFUL; + + if (!(source->container = gst_bin_new("wg_source"))) + goto error; + + gst_element_set_state(source->container, GST_STATE_PAUSED); + if (!gst_element_get_state(source->container, NULL, NULL, -1)) + goto error; + + params->source = (wg_source_t)(ULONG_PTR)source; + GST_INFO("Created winegstreamer source %p.", source); + return STATUS_SUCCESS; + +error: + if (source->container) + { + gst_element_set_state(source->container, GST_STATE_NULL); + gst_object_unref(source->container); + } + free(source); + + GST_ERROR("Failed to create winegstreamer source."); + return STATUS_UNSUCCESSFUL; +} + +NTSTATUS wg_source_destroy(void *args) +{ + struct wg_source *source = get_source(*(wg_source_t *)args); + + GST_TRACE("source %p", source); + + gst_element_set_state(source->container, GST_STATE_NULL); + gst_object_unref(source->container); + free(source); + + return STATUS_SUCCESS; +}
From: Rémi Bernon rbernon@codeweavers.com
--- dlls/winegstreamer/media_source.c | 35 ++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 10 deletions(-)
diff --git a/dlls/winegstreamer/media_source.c b/dlls/winegstreamer/media_source.c index c6403661c0f..58ea8ed0c2b 100644 --- a/dlls/winegstreamer/media_source.c +++ b/dlls/winegstreamer/media_source.c @@ -27,6 +27,8 @@
WINE_DEFAULT_DEBUG_CHANNEL(mfplat);
+#define SOURCE_BUFFER_SIZE (64 * 1024) + struct object_context { IUnknown IUnknown_iface; @@ -37,6 +39,7 @@ struct object_context UINT64 file_size; WCHAR *url;
+ BYTE *buffer; wg_source_t wg_source; };
@@ -84,6 +87,7 @@ static ULONG WINAPI object_context_Release(IUnknown *iface) wg_source_destroy(context->wg_source); IMFAsyncResult_Release(context->result); IMFByteStream_Release(context->stream); + free(context->buffer); free(context->url); free(context); } @@ -99,14 +103,16 @@ static const IUnknownVtbl object_context_vtbl = };
static HRESULT object_context_create(DWORD flags, IMFByteStream *stream, const WCHAR *url, - QWORD file_size, IMFAsyncResult *result, IUnknown **out) + QWORD file_size, IMFAsyncResult *result, IUnknown **out, BYTE **out_buf) { WCHAR *tmp_url = url ? wcsdup(url) : NULL; struct object_context *context;
- if (!(context = calloc(1, sizeof(*context)))) + if (!(context = calloc(1, sizeof(*context))) + || !(context->buffer = malloc(SOURCE_BUFFER_SIZE))) { free(tmp_url); + free(context); return E_OUTOFMEMORY; }
@@ -120,6 +126,7 @@ static HRESULT object_context_create(DWORD flags, IMFByteStream *stream, const W IMFAsyncResult_AddRef(context->result);
*out = &context->IUnknown_iface; + *out_buf = context->buffer; return S_OK; }
@@ -1861,7 +1868,8 @@ static HRESULT WINAPI stream_handler_BeginCreateObject(IMFByteStreamHandler *ifa struct stream_handler *handler = impl_from_IMFByteStreamHandler(iface); IMFAsyncResult *result; IUnknown *context; - QWORD file_size; + QWORD file_size, position; + BYTE *buffer; HRESULT hr; DWORD caps;
@@ -1890,13 +1898,16 @@ static HRESULT WINAPI stream_handler_BeginCreateObject(IMFByteStreamHandler *ifa
if (FAILED(hr = MFCreateAsyncResult(NULL, callback, state, &result))) return hr; - if (FAILED(hr = object_context_create(flags, stream, url, file_size, result, &context))) - { - IMFAsyncResult_Release(result); - return hr; - } + if (FAILED(hr = object_context_create(flags, stream, url, file_size, result, &context, &buffer))) + goto done;
- hr = MFPutWorkItem(MFASYNC_CALLBACK_QUEUE_IO, &handler->IMFAsyncCallback_iface, context); + if (FAILED(hr = IMFByteStream_GetCurrentPosition(stream, &position))) + WARN("Failed to get byte stream position, hr %#lx\n", hr); + else if (position != 0 && FAILED(hr = IMFByteStream_SetCurrentPosition(stream, 0))) + WARN("Failed to set byte stream position, hr %#lx\n", hr); + else if (FAILED(hr = IMFByteStream_BeginRead(stream, buffer, min(SOURCE_BUFFER_SIZE, file_size), + &handler->IMFAsyncCallback_iface, context))) + WARN("Failed to queue byte stream async read, hr %#lx\n", hr); IUnknown_Release(context);
if (SUCCEEDED(hr) && cancel_cookie) @@ -1905,6 +1916,7 @@ static HRESULT WINAPI stream_handler_BeginCreateObject(IMFByteStreamHandler *ifa IUnknown_AddRef(*cancel_cookie); }
+done: IMFAsyncResult_Release(result);
return hr; @@ -2004,12 +2016,15 @@ static HRESULT WINAPI stream_handler_callback_Invoke(IMFAsyncCallback *iface, IM IUnknown *object, *state = IMFAsyncResult_GetStateNoAddRef(result); struct object_context *context; struct result_entry *entry; + DWORD size = 0; HRESULT hr;
if (!state || !(context = impl_from_IUnknown(state))) return E_INVALIDARG;
- if (FAILED(hr = wg_source_create(&context->wg_source))) + if (FAILED(hr = IMFByteStream_EndRead(context->stream, result, &size))) + WARN("Failed to complete stream read, hr %#lx\n", hr); + else if (FAILED(hr = wg_source_create(&context->wg_source))) WARN("Failed to create wg_source, hr %#lx\n", hr); else if (FAILED(hr = media_source_create(context, (IMFMediaSource **)&object))) WARN("Failed to create media source, hr %#lx\n", hr);
From: Rémi Bernon rbernon@codeweavers.com
--- dlls/winegstreamer/gst_private.h | 2 +- dlls/winegstreamer/main.c | 15 ++++++++++++--- dlls/winegstreamer/media_source.c | 2 +- dlls/winegstreamer/unix_private.h | 1 + dlls/winegstreamer/unixlib.c | 26 ++++++++++++++++++++++++++ dlls/winegstreamer/unixlib.h | 3 +++ dlls/winegstreamer/wg_parser.c | 24 +++++++++++++++++++++++- dlls/winegstreamer/wg_source.c | 10 ++++++++++ 8 files changed, 77 insertions(+), 6 deletions(-)
diff --git a/dlls/winegstreamer/gst_private.h b/dlls/winegstreamer/gst_private.h index fd33ac42c19..b08e3566de4 100644 --- a/dlls/winegstreamer/gst_private.h +++ b/dlls/winegstreamer/gst_private.h @@ -110,7 +110,7 @@ bool wg_transform_get_status(wg_transform_t transform, bool *accepts_input); HRESULT wg_transform_drain(wg_transform_t transform); HRESULT wg_transform_flush(wg_transform_t transform);
-HRESULT wg_source_create(wg_source_t *out); +HRESULT wg_source_create(const WCHAR *url, const void *data, uint32_t size, wg_source_t *out); void wg_source_destroy(wg_source_t source);
unsigned int wg_format_get_max_size(const struct wg_format *format); diff --git a/dlls/winegstreamer/main.c b/dlls/winegstreamer/main.c index 8237936891c..30cfe079686 100644 --- a/dlls/winegstreamer/main.c +++ b/dlls/winegstreamer/main.c @@ -456,12 +456,20 @@ HRESULT wg_transform_flush(wg_transform_t transform) return S_OK; }
-HRESULT wg_source_create(wg_source_t *out) +HRESULT wg_source_create(const WCHAR *url, const void *data, uint32_t size, wg_source_t *out) { - struct wg_source_create_params params = {0}; + struct wg_source_create_params params = + { + .data = data, .size = size, + }; + UINT len = url ? WideCharToMultiByte(CP_ACP, 0, url, -1, NULL, 0, NULL, NULL) : 0; + char *tmp = url ? malloc(len) : NULL; NTSTATUS status;
- TRACE("out %p\n", out); + TRACE("url %s, data %p, size %#x\n", debugstr_w(url), data, size); + + if ((params.url = tmp)) + WideCharToMultiByte(CP_ACP, 0, url, -1, tmp, len, NULL, NULL);
if ((status = WINE_UNIX_CALL(unix_wg_source_create, ¶ms))) WARN("wg_source_create returned status %#lx\n", status); @@ -471,6 +479,7 @@ HRESULT wg_source_create(wg_source_t *out) *out = params.source; }
+ free(tmp); return HRESULT_FROM_NT(status); }
diff --git a/dlls/winegstreamer/media_source.c b/dlls/winegstreamer/media_source.c index 58ea8ed0c2b..950c1506152 100644 --- a/dlls/winegstreamer/media_source.c +++ b/dlls/winegstreamer/media_source.c @@ -2024,7 +2024,7 @@ static HRESULT WINAPI stream_handler_callback_Invoke(IMFAsyncCallback *iface, IM
if (FAILED(hr = IMFByteStream_EndRead(context->stream, result, &size))) WARN("Failed to complete stream read, hr %#lx\n", hr); - else if (FAILED(hr = wg_source_create(&context->wg_source))) + else if (FAILED(hr = wg_source_create(context->url, context->buffer, size, &context->wg_source))) WARN("Failed to create wg_source, hr %#lx\n", hr); else if (FAILED(hr = media_source_create(context, (IMFMediaSource **)&object))) WARN("Failed to create media source, hr %#lx\n", hr); diff --git a/dlls/winegstreamer/unix_private.h b/dlls/winegstreamer/unix_private.h index 8574cac797f..5e96d8a144a 100644 --- a/dlls/winegstreamer/unix_private.h +++ b/dlls/winegstreamer/unix_private.h @@ -39,6 +39,7 @@ extern GstElement *find_element(GstElementFactoryListType type, GstCaps *src_cap extern bool append_element(GstElement *container, GstElement *element, GstElement **first, GstElement **last) DECLSPEC_HIDDEN; extern bool link_src_to_element(GstPad *src_pad, GstElement *element) DECLSPEC_HIDDEN; extern bool link_element_to_sink(GstElement *element, GstPad *sink_pad) DECLSPEC_HIDDEN; +extern GstCaps *detect_caps_from_data(const char *url, const void *data, guint size) DECLSPEC_HIDDEN;
/* wg_format.c */
diff --git a/dlls/winegstreamer/unixlib.c b/dlls/winegstreamer/unixlib.c index cb28b76b65e..b926ee1dac3 100644 --- a/dlls/winegstreamer/unixlib.c +++ b/dlls/winegstreamer/unixlib.c @@ -203,6 +203,32 @@ bool link_element_to_sink(GstElement *element, GstPad *sink_pad) return !ret; }
+GstCaps *detect_caps_from_data(const char *url, const void *data, guint size) +{ + const char *extension = url ? strrchr(url, '.') : NULL; + GstTypeFindProbability probability; + GstCaps *caps; + gchar *str; + + if (!(caps = gst_type_find_helper_for_data_with_extension(NULL, data, size, + extension ? extension + 1 : NULL, &probability))) + { + GST_ERROR("Failed to detect caps for url %s, data %p, size %u", url, data, size); + return NULL; + } + + str = gst_caps_to_string(caps); + if (probability > GST_TYPE_FIND_POSSIBLE) + GST_INFO("Detected caps %s with probability %u for url %s, data %p, size %u", + str, probability, url, data, size); + else + GST_FIXME("Detected caps %s with probability %u for url %s, data %p, size %u", + str, probability, url, data, size); + g_free(str); + + return caps; +} + NTSTATUS wg_init_gstreamer(void *arg) { char arg0[] = "wine"; diff --git a/dlls/winegstreamer/unixlib.h b/dlls/winegstreamer/unixlib.h index 7fb177c4ed5..f44774fc06f 100644 --- a/dlls/winegstreamer/unixlib.h +++ b/dlls/winegstreamer/unixlib.h @@ -366,6 +366,9 @@ struct wg_transform_get_status_params
struct wg_source_create_params { + const char *url; + const void *data; + UINT32 size; wg_source_t source; };
diff --git a/dlls/winegstreamer/wg_parser.c b/dlls/winegstreamer/wg_parser.c index 387871d116a..d03900c8afc 100644 --- a/dlls/winegstreamer/wg_parser.c +++ b/dlls/winegstreamer/wg_parser.c @@ -2155,6 +2155,28 @@ NTSTATUS wow64_wg_transform_read_data(void *args) return ret; }
+NTSTATUS wow64_wg_source_create(void *args) +{ + struct + { + PTR32 url; + PTR32 data; + UINT32 size; + wg_source_t source; + } *params32 = args; + struct wg_source_create_params params = + { + .url = ULongToPtr(params32->url), + .data = ULongToPtr(params32->data), + .size = params32->size, + }; + NTSTATUS ret; + + ret = wg_source_create(¶ms); + params32->source = params.source; + return ret; +} + const unixlib_entry_t __wine_unix_call_wow64_funcs[] = { #define X64(name) [unix_ ## name] = wow64_ ## name @@ -2196,7 +2218,7 @@ const unixlib_entry_t __wine_unix_call_wow64_funcs[] = X(wg_transform_drain), X(wg_transform_flush),
- X(wg_source_create), + X64(wg_source_create), X(wg_source_destroy), };
diff --git a/dlls/winegstreamer/wg_source.c b/dlls/winegstreamer/wg_source.c index 52a54196ea9..0cdb3f3fdb7 100644 --- a/dlls/winegstreamer/wg_source.c +++ b/dlls/winegstreamer/wg_source.c @@ -53,9 +53,15 @@ NTSTATUS wg_source_create(void *args) { struct wg_source_create_params *params = args; struct wg_source *source; + GstCaps *src_caps;
+ if (!(src_caps = detect_caps_from_data(params->url, params->data, params->size))) + return STATUS_UNSUCCESSFUL; if (!(source = calloc(1, sizeof(*source)))) + { + gst_caps_unref(src_caps); return STATUS_UNSUCCESSFUL; + }
if (!(source->container = gst_bin_new("wg_source"))) goto error; @@ -64,6 +70,8 @@ NTSTATUS wg_source_create(void *args) if (!gst_element_get_state(source->container, NULL, NULL, -1)) goto error;
+ gst_caps_unref(src_caps); + params->source = (wg_source_t)(ULONG_PTR)source; GST_INFO("Created winegstreamer source %p.", source); return STATUS_SUCCESS; @@ -76,6 +84,8 @@ error: } free(source);
+ gst_caps_unref(src_caps); + GST_ERROR("Failed to create winegstreamer source."); return STATUS_UNSUCCESSFUL; }
From: Rémi Bernon rbernon@codeweavers.com
--- dlls/winegstreamer/media_source.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/dlls/winegstreamer/media_source.c b/dlls/winegstreamer/media_source.c index 950c1506152..4f02f089490 100644 --- a/dlls/winegstreamer/media_source.c +++ b/dlls/winegstreamer/media_source.c @@ -102,10 +102,30 @@ static const IUnknownVtbl object_context_vtbl = object_context_Release, };
+static WCHAR *byte_stream_get_url(IMFByteStream *stream, const WCHAR *url) +{ + IMFAttributes *attributes; + WCHAR buffer[MAX_PATH]; + UINT32 size; + HRESULT hr; + + if (SUCCEEDED(hr = IMFByteStream_QueryInterface(stream, &IID_IMFAttributes, (void **)&attributes))) + { + if (FAILED(hr = IMFAttributes_GetString(attributes, &MF_BYTESTREAM_ORIGIN_NAME, + buffer, ARRAY_SIZE(buffer), &size))) + WARN("Failed to get MF_BYTESTREAM_ORIGIN_NAME got size %#x, hr %#lx\n", size, hr); + else + url = buffer; + IMFAttributes_Release(attributes); + } + + return url ? wcsdup(url) : NULL; +} + static HRESULT object_context_create(DWORD flags, IMFByteStream *stream, const WCHAR *url, QWORD file_size, IMFAsyncResult *result, IUnknown **out, BYTE **out_buf) { - WCHAR *tmp_url = url ? wcsdup(url) : NULL; + WCHAR *tmp_url = byte_stream_get_url(stream, url); struct object_context *context;
if (!(context = calloc(1, sizeof(*context)))
From: Rémi Bernon rbernon@codeweavers.com
--- dlls/winegstreamer/unix_private.h | 1 + dlls/winegstreamer/unixlib.c | 15 +++++++++++++++ dlls/winegstreamer/wg_parser.c | 4 +--- dlls/winegstreamer/wg_source.c | 7 +++++++ dlls/winegstreamer/wg_transform.c | 13 ++----------- 5 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/dlls/winegstreamer/unix_private.h b/dlls/winegstreamer/unix_private.h index 5e96d8a144a..a3cd1438cc6 100644 --- a/dlls/winegstreamer/unix_private.h +++ b/dlls/winegstreamer/unix_private.h @@ -40,6 +40,7 @@ extern bool append_element(GstElement *container, GstElement *element, GstElemen extern bool link_src_to_element(GstPad *src_pad, GstElement *element) DECLSPEC_HIDDEN; extern bool link_element_to_sink(GstElement *element, GstPad *sink_pad) DECLSPEC_HIDDEN; extern GstCaps *detect_caps_from_data(const char *url, const void *data, guint size) DECLSPEC_HIDDEN; +extern GstPad *create_pad_with_caps(GstPadDirection direction, GstCaps *caps) DECLSPEC_HIDDEN;
/* wg_format.c */
diff --git a/dlls/winegstreamer/unixlib.c b/dlls/winegstreamer/unixlib.c index b926ee1dac3..e573f36b597 100644 --- a/dlls/winegstreamer/unixlib.c +++ b/dlls/winegstreamer/unixlib.c @@ -229,6 +229,21 @@ GstCaps *detect_caps_from_data(const char *url, const void *data, guint size) return caps; }
+GstPad *create_pad_with_caps(GstPadDirection direction, GstCaps *caps) +{ + GstCaps *pad_caps = caps ? gst_caps_ref(caps) : gst_caps_new_any(); + const char *name = direction == GST_PAD_SRC ? "src" : "sink"; + GstPadTemplate *template; + GstPad *pad; + + if (!pad_caps || !(template = gst_pad_template_new(name, direction, GST_PAD_ALWAYS, pad_caps))) + return NULL; + pad = gst_pad_new_from_template(template, "src"); + g_object_unref(template); + gst_caps_unref(pad_caps); + return pad; +} + NTSTATUS wg_init_gstreamer(void *arg) { char arg0[] = "wine"; diff --git a/dlls/winegstreamer/wg_parser.c b/dlls/winegstreamer/wg_parser.c index d03900c8afc..2ebbce21e58 100644 --- a/dlls/winegstreamer/wg_parser.c +++ b/dlls/winegstreamer/wg_parser.c @@ -1544,8 +1544,6 @@ static void query_tags(struct wg_parser_stream *stream)
static NTSTATUS wg_parser_connect(void *args) { - GstStaticPadTemplate src_template = GST_STATIC_PAD_TEMPLATE("quartz_src", - GST_PAD_SRC, GST_PAD_ALWAYS, GST_STATIC_CAPS_ANY); const struct wg_parser_connect_params *params = args; struct wg_parser *parser = get_parser(params->parser); unsigned int i; @@ -1563,7 +1561,7 @@ static NTSTATUS wg_parser_connect(void *args) parser->container = gst_bin_new(NULL); gst_element_set_bus(parser->container, parser->bus);
- parser->my_src = gst_pad_new_from_static_template(&src_template, "quartz-src"); + parser->my_src = create_pad_with_caps(GST_PAD_SRC, NULL); gst_pad_set_getrange_function(parser->my_src, src_getrange_cb); gst_pad_set_query_function(parser->my_src, src_query_cb); gst_pad_set_activatemode_function(parser->my_src, src_activate_mode_cb); diff --git a/dlls/winegstreamer/wg_source.c b/dlls/winegstreamer/wg_source.c index 0cdb3f3fdb7..68bf7da045d 100644 --- a/dlls/winegstreamer/wg_source.c +++ b/dlls/winegstreamer/wg_source.c @@ -41,6 +41,7 @@
struct wg_source { + GstPad *src_pad; GstElement *container; };
@@ -65,6 +66,9 @@ NTSTATUS wg_source_create(void *args)
if (!(source->container = gst_bin_new("wg_source"))) goto error; + if (!(source->src_pad = create_pad_with_caps(GST_PAD_SRC, src_caps))) + goto error; + gst_pad_set_element_private(source->src_pad, source);
gst_element_set_state(source->container, GST_STATE_PAUSED); if (!gst_element_get_state(source->container, NULL, NULL, -1)) @@ -82,6 +86,8 @@ error: gst_element_set_state(source->container, GST_STATE_NULL); gst_object_unref(source->container); } + if (source->src_pad) + gst_object_unref(source->src_pad); free(source);
gst_caps_unref(src_caps); @@ -98,6 +104,7 @@ NTSTATUS wg_source_destroy(void *args)
gst_element_set_state(source->container, GST_STATE_NULL); gst_object_unref(source->container); + gst_object_unref(source->src_pad); free(source);
return STATUS_SUCCESS; diff --git a/dlls/winegstreamer/wg_transform.c b/dlls/winegstreamer/wg_transform.c index 775fed7a46c..fd9e19970d2 100644 --- a/dlls/winegstreamer/wg_transform.c +++ b/dlls/winegstreamer/wg_transform.c @@ -298,7 +298,6 @@ NTSTATUS wg_transform_create(void *args) GstElement *first = NULL, *last = NULL, *element; GstCaps *raw_caps = NULL, *src_caps = NULL; NTSTATUS status = STATUS_UNSUCCESSFUL; - GstPadTemplate *template = NULL; struct wg_transform *transform; const gchar *media_type; GstEvent *event; @@ -320,11 +319,7 @@ NTSTATUS wg_transform_create(void *args)
if (!(src_caps = wg_format_to_caps(&input_format))) goto out; - if (!(template = gst_pad_template_new("src", GST_PAD_SRC, GST_PAD_ALWAYS, src_caps))) - goto out; - transform->my_src = gst_pad_new_from_template(template, "src"); - g_object_unref(template); - if (!transform->my_src) + if (!(transform->my_src = create_pad_with_caps(GST_PAD_SRC, src_caps))) goto out;
gst_pad_set_element_private(transform->my_src, transform); @@ -332,11 +327,7 @@ NTSTATUS wg_transform_create(void *args)
if (!(transform->output_caps = wg_format_to_caps(&output_format))) goto out; - if (!(template = gst_pad_template_new("sink", GST_PAD_SINK, GST_PAD_ALWAYS, transform->output_caps))) - goto out; - transform->my_sink = gst_pad_new_from_template(template, "sink"); - g_object_unref(template); - if (!transform->my_sink) + if (!(transform->my_sink = create_pad_with_caps(GST_PAD_SINK, transform->output_caps))) goto out;
gst_pad_set_element_private(transform->my_sink, transform);
From: Rémi Bernon rbernon@codeweavers.com
--- dlls/winegstreamer/wg_source.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/dlls/winegstreamer/wg_source.c b/dlls/winegstreamer/wg_source.c index 68bf7da045d..62436c26726 100644 --- a/dlls/winegstreamer/wg_source.c +++ b/dlls/winegstreamer/wg_source.c @@ -53,8 +53,9 @@ static struct wg_source *get_source(wg_source_t source) NTSTATUS wg_source_create(void *args) { struct wg_source_create_params *params = args; + GstElement *first = NULL, *last = NULL, *element; + GstCaps *src_caps, *any_caps; struct wg_source *source; - GstCaps *src_caps;
if (!(src_caps = detect_caps_from_data(params->url, params->data, params->size))) return STATUS_UNSUCCESSFUL; @@ -70,6 +71,21 @@ NTSTATUS wg_source_create(void *args) goto error; gst_pad_set_element_private(source->src_pad, source);
+ if (!(any_caps = gst_caps_new_any())) + goto error; + if (!(element = find_element(GST_ELEMENT_FACTORY_TYPE_DECODABLE, src_caps, any_caps)) + || !append_element(source->container, element, &first, &last)) + { + gst_caps_unref(any_caps); + goto error; + } + gst_caps_unref(any_caps); + + if (!link_src_to_element(source->src_pad, first)) + goto error; + if (!gst_pad_set_active(source->src_pad, true)) + goto error; + gst_element_set_state(source->container, GST_STATE_PAUSED); if (!gst_element_get_state(source->container, NULL, NULL, -1)) goto error;
From: Rémi Bernon rbernon@codeweavers.com
--- dlls/winegstreamer/gst_private.h | 1 + dlls/winegstreamer/main.c | 14 +++++++++++ dlls/winegstreamer/media_source.c | 2 ++ dlls/winegstreamer/unix_private.h | 1 + dlls/winegstreamer/unixlib.h | 8 ++++++ dlls/winegstreamer/wg_parser.c | 20 +++++++++++++++ dlls/winegstreamer/wg_source.c | 42 +++++++++++++++++++++++++++++++ 7 files changed, 88 insertions(+)
diff --git a/dlls/winegstreamer/gst_private.h b/dlls/winegstreamer/gst_private.h index b08e3566de4..a9bed2ff556 100644 --- a/dlls/winegstreamer/gst_private.h +++ b/dlls/winegstreamer/gst_private.h @@ -112,6 +112,7 @@ HRESULT wg_transform_flush(wg_transform_t transform);
HRESULT wg_source_create(const WCHAR *url, const void *data, uint32_t size, wg_source_t *out); void wg_source_destroy(wg_source_t source); +HRESULT wg_source_push_data(wg_source_t source, const void *data, uint32_t size);
unsigned int wg_format_get_max_size(const struct wg_format *format);
diff --git a/dlls/winegstreamer/main.c b/dlls/winegstreamer/main.c index 30cfe079686..b003b6fde23 100644 --- a/dlls/winegstreamer/main.c +++ b/dlls/winegstreamer/main.c @@ -490,6 +490,20 @@ void wg_source_destroy(wg_source_t source) WINE_UNIX_CALL(unix_wg_source_destroy, &source); }
+HRESULT wg_source_push_data(wg_source_t source, const void *data, uint32_t size) +{ + struct wg_source_push_data_params params = + { + .source = source, + .data = data, + .size = size, + }; + + TRACE("source %#I64x, data %p, size %#x\n", source, data, size); + + return HRESULT_FROM_NT(WINE_UNIX_CALL(unix_wg_source_push_data, ¶ms)); +} + #define ALIGN(n, alignment) (((n) + (alignment) - 1) & ~((alignment) - 1))
unsigned int wg_format_get_stride(const struct wg_format *format) diff --git a/dlls/winegstreamer/media_source.c b/dlls/winegstreamer/media_source.c index 4f02f089490..30204150eb3 100644 --- a/dlls/winegstreamer/media_source.c +++ b/dlls/winegstreamer/media_source.c @@ -2046,6 +2046,8 @@ static HRESULT WINAPI stream_handler_callback_Invoke(IMFAsyncCallback *iface, IM WARN("Failed to complete stream read, hr %#lx\n", hr); else if (FAILED(hr = wg_source_create(context->url, context->buffer, size, &context->wg_source))) WARN("Failed to create wg_source, hr %#lx\n", hr); + else if (FAILED(hr = wg_source_push_data(context->wg_source, context->buffer, size))) + WARN("Failed to push initial data, hr %#lx\n", hr); else if (FAILED(hr = media_source_create(context, (IMFMediaSource **)&object))) WARN("Failed to create media source, hr %#lx\n", hr); else diff --git a/dlls/winegstreamer/unix_private.h b/dlls/winegstreamer/unix_private.h index a3cd1438cc6..5fb0e85776a 100644 --- a/dlls/winegstreamer/unix_private.h +++ b/dlls/winegstreamer/unix_private.h @@ -63,6 +63,7 @@ extern NTSTATUS wg_transform_flush(void *args) DECLSPEC_HIDDEN;
extern NTSTATUS wg_source_create(void *args) DECLSPEC_HIDDEN; extern NTSTATUS wg_source_destroy(void *args) DECLSPEC_HIDDEN; +extern NTSTATUS wg_source_push_data(void *args) DECLSPEC_HIDDEN;
/* wg_allocator.c */
diff --git a/dlls/winegstreamer/unixlib.h b/dlls/winegstreamer/unixlib.h index f44774fc06f..d1e4497b196 100644 --- a/dlls/winegstreamer/unixlib.h +++ b/dlls/winegstreamer/unixlib.h @@ -372,6 +372,13 @@ struct wg_source_create_params wg_source_t source; };
+struct wg_source_push_data_params +{ + wg_source_t source; + const void *data; + UINT32 size; +}; + enum unix_funcs { unix_wg_init_gstreamer, @@ -414,6 +421,7 @@ enum unix_funcs
unix_wg_source_create, unix_wg_source_destroy, + unix_wg_source_push_data, };
#endif /* __WINE_WINEGSTREAMER_UNIXLIB_H */ diff --git a/dlls/winegstreamer/wg_parser.c b/dlls/winegstreamer/wg_parser.c index 2ebbce21e58..1e1ddf9e9e8 100644 --- a/dlls/winegstreamer/wg_parser.c +++ b/dlls/winegstreamer/wg_parser.c @@ -1946,6 +1946,7 @@ const unixlib_entry_t __wine_unix_call_funcs[] =
X(wg_source_create), X(wg_source_destroy), + X(wg_source_push_data), };
#ifdef _WIN64 @@ -2175,6 +2176,24 @@ NTSTATUS wow64_wg_source_create(void *args) return ret; }
+NTSTATUS wow64_wg_source_push_data(void *args) +{ + struct + { + wg_source_t source; + PTR32 data; + UINT32 size; + } *params32 = args; + struct wg_source_push_data_params params = + { + .source = params32->source, + .data = ULongToPtr(params32->data), + .size = params32->size, + }; + + return wg_source_push_data(¶ms); +} + const unixlib_entry_t __wine_unix_call_wow64_funcs[] = { #define X64(name) [unix_ ## name] = wow64_ ## name @@ -2218,6 +2237,7 @@ const unixlib_entry_t __wine_unix_call_wow64_funcs[] =
X64(wg_source_create), X(wg_source_destroy), + X64(wg_source_push_data), };
#endif /* _WIN64 */ diff --git a/dlls/winegstreamer/wg_source.c b/dlls/winegstreamer/wg_source.c index 62436c26726..4f6678cacfc 100644 --- a/dlls/winegstreamer/wg_source.c +++ b/dlls/winegstreamer/wg_source.c @@ -43,6 +43,8 @@ struct wg_source { GstPad *src_pad; GstElement *container; + GstSegment segment; + bool valid_segment; };
static struct wg_source *get_source(wg_source_t source) @@ -50,12 +52,29 @@ static struct wg_source *get_source(wg_source_t source) return (struct wg_source *)(ULONG_PTR)source; }
+static GstEvent *create_stream_start_event(const char *stream_id) +{ + GstStream *stream; + GstEvent *event; + + if (!(stream = gst_stream_new(stream_id, NULL, GST_STREAM_TYPE_UNKNOWN, 0))) + return NULL; + if ((event = gst_event_new_stream_start(stream_id))) + { + gst_event_set_stream(event, stream); + gst_object_unref(stream); + } + + return event; +} + NTSTATUS wg_source_create(void *args) { struct wg_source_create_params *params = args; GstElement *first = NULL, *last = NULL, *element; GstCaps *src_caps, *any_caps; struct wg_source *source; + GstEvent *event;
if (!(src_caps = detect_caps_from_data(params->url, params->data, params->size))) return STATUS_UNSUCCESSFUL; @@ -64,6 +83,7 @@ NTSTATUS wg_source_create(void *args) gst_caps_unref(src_caps); return STATUS_UNSUCCESSFUL; } + gst_segment_init(&source->segment, GST_FORMAT_BYTES);
if (!(source->container = gst_bin_new("wg_source"))) goto error; @@ -90,6 +110,9 @@ NTSTATUS wg_source_create(void *args) if (!gst_element_get_state(source->container, NULL, NULL, -1)) goto error;
+ if (!(event = create_stream_start_event("wg_source")) + || !gst_pad_push_event(source->src_pad, event)) + goto error; gst_caps_unref(src_caps);
params->source = (wg_source_t)(ULONG_PTR)source; @@ -125,3 +148,22 @@ NTSTATUS wg_source_destroy(void *args)
return STATUS_SUCCESS; } + +NTSTATUS wg_source_push_data(void *args) +{ + struct wg_source_push_data_params *params = args; + struct wg_source *source = get_source(params->source); + GstEvent *event; + + GST_TRACE("source %p, data %p, size %#x", source, params->data, params->size); + + if (!source->valid_segment) + { + if (!(event = gst_event_new_segment(&source->segment)) + || !gst_pad_push_event(source->src_pad, event)) + GST_ERROR("Failed to push new segment event"); + source->valid_segment = true; + } + + return STATUS_SUCCESS; +}
I think you seem to be under the impression that I've been saying we shouldn't output compressed streams from the MF media source, or wg_parser. I'm not, and if I've miscommunicated to the contrary, I apologize. Even back when the media source was first submitted, I recognized that we might need to do that eventually—I just said "let's start with decoding all the way to raw formats, and add specific compressed streams when we need to," and then it turned out we didn't need to for a long time.
To clarify statements I've made in the past, I have *resisted* actually doing that work, partly because it's a lot of work (not just for someone to do, but for us to have in tree), and partly because it makes our pipelines more complicated, harder to debug, and (at least in theory) less performant. In those cases where applications make assumptions I've seen it as fine to just incrementally tweak the way we output uncompressed streams, one part a time, because that usually works. I have a hard time calling these hacks, because they almost by definition don't go against the design of quartz (or mfplat), they just make our custom elements act a little more like some specific real element. I don't think "it's the way Windows does it internally" is a good enough reason to design something internally in Wine *if a program doesn't care*, and *if you have a good reason to do it differently*. Historically I think both of those have been the case.
Nevertheless, I've never tried to say we *shouldn't* expose compressed streams directly. And recently we have found applications that really do need compressed streams, in ways that are hard or impossible to avoid just by having our generic decoder act as a better simulacrum.
What I have been trying to ask is: why do we need to replace wg_parser? And, separately, why do we need to replace decodebin? And if decodebin has problems, can they be fixed on the GStreamer level instead of just giving up on those libraries?
Ziqing has a proposal for allowing decodebin, and hence wg_parser, to output compressed sample data, and I think it makes a lot of sense. However, you say that it's not going to work, or that decodebin is not a good fit for winegstreamer. I know you've done a lot of work on multimedia, so if you say that decodebin has problems, or that there are problems with Ziqing's approach, I want to take that into account.
One problem you mention with decodebin is that it exposes streams in a non-deterministic order. This makes sense, I think. However, I can't say that the idea of solving this by completely rewriting decodebin sits particularly well with me. The particular reason is that even with the new code, we're not really relying on documented, API-guaranteed behaviour. And I'm not dead-set against leaning on internals a bit, but I'm not thrilled about it either, and I don't like the idea of introducing a large amount of code just so that we can lean on *different* internals. Point is, if we need streams to be ordered in some specific way—the order in which they're enumerated in the file, for instance—can we get GStreamer to provide this to us explicitly, through some tag? But even if that's a non-starter, and we have to lean on internals, can we lean on the order in which our autoplug-continue signals are sent, maybe?
Another problem I've gathered is that either decodebin or wg_parser is too slow to initialize streams, and this trips up an Unreal Engine bug. What I haven't fully gathered, though, is what part(s) of the process are slow. Is it decodebin autoplugging? Our prerolling? Something else? If the problem is decodebin autoplugging, can we resolve that by returning as soon as we have all of the uncompressed streams from autoplug-continue? If it's our prerolling, can we try to return earlier from wg_parser_connect(), and defer the rest of the prerolling process to when we really need it?
And, if in the end it turns out that we really can't use decodebin, why do we need to replace wg_parser? Why can't we just replace the innards of wg_parser, and leave the interface and frontends untouched?
On the very conceptual level, `decodebin` is a black box meant to create a decoding pipeline. Trying to make it not decode should be a sign that it's not fit. I'm not going to detail the problems once again, you summarized some of them, and I have explained things in length in all the previous discussions.
Replacing wg_parser makes things easier, because wg_parser is designed around decodebin, and we need a different model. It also allows me to change only one frontend at a time, while keeping it working, as like said, it's going to be a lot of work to make the rest of the playback code work well with encoded buffers, as that use case was never considered before.
As a last comment, in general I think we should stop relying on the higher level GStreamer elements. To the contrary to Wine, we don't own its code, and anything complex is likely going to be bogus. Fixing bug in this case is more difficult as it would need to be upstreamed first and then we'd have to wait for it to trickle down. For instance we've discovered lately that `videoflip` is bogus and that it does an extra frame copy by default even though it should be pass-through. This has a huge cost when frames are large and it has a performance impact. There's only really one thing that we want from GStreamer: decoder/encoder/demuxer/muxer components, and the unified API to work with them is a nice thing to have. Everything else should be avoided.
On the very conceptual level, `decodebin` is a black box meant to create a decoding pipeline. Trying to make it not decode should be a sign that it's not fit. I'm not going to detail the problems once again, you summarized some of them, and I have explained things in length in all the previous discussions.
But we are decoding. We're just decoding to specific formats. Which is exactly what decodebin does; it just decodes to raw formats by default, and we want to not do that, sometimes. It's easy to change the set of formats that decodebin stops at, and Ziqing's patches do exactly that.
Replacing wg_parser makes things easier, because wg_parser is designed around decodebin, and we need a different model. It also allows me to change only one frontend at a time, while keeping it working, as like said, it's going to be a lot of work to make the rest of the playback code work well with encoded buffers, as that use case was never considered before.
I don't understand this at all. wg_parser isn't designed around decodebin, and it never has been. It's designed around a parser, i.e. a demuxer. In fact, from the very beginning it *was* written to support elements other than decodebin, and specifically elements that output uncompressed formats. That's carried over from quartz.
And the support isn't theoretical. There are working applications out there that instantiate a wg_parser containing avidemux, that outputs Cinepak compressed video. Or a wg_parser containing mpegaudioparser, that outputs MPEG-1 compressed audio.
Nor do I see any way in which the wg_parser interface, or even its implementation, is incapable of doing these things. In fact, having written the support to actually *do* it, the only thing it has required is adding another format to wg_format, and filling everything that switches based on format. And the only awkward thing about that has been trying to determine a maximum sample size.
We could even take this approach for new formats—that is, add new members to enum wg_parser_type. I'm not convinced we want to, because there are some drawbacks to it (which I can go into if it'd help), and I think Ziqing's approach is generally better. But it should go to show that it's not obvious that there are any problems with the wg_parser API. If you see any problems with wg_parser, can you please share them?
The objections you've shared have been predominantly along the lines of "X is designed to do Y; we want to do Z instead". And I don't think these objections are specious in principle, largely because I've had similar objections myself; I have often looked at code and found it ugly simply based on high-level design and intention. If this is your only problem with wg_parser and decodebin, I hope I can explain to you my perspective, and why I think that outputting compressed samples, or speeding up instantiation, or providing a specific stream order, is actually perfectly within the design of these APIs.
As an extra note, changing one frontend at a time is a nice bonus, *if* we're going to introduce a new API. But it's not clear at all to me why we do want to introduce a new API.
As a last comment, in general I think we should stop relying on the higher level GStreamer elements. To the contrary to Wine, we don't own its code, and anything complex is likely going to be bogus. Fixing bug in this case is more difficult as it would need to be upstreamed first and then we'd have to wait for it to trickle down.
I understand being frustrated with upstream libraries, and I gather you've had some poor experiences with libraries, but I don't think that this is the right answer. Without getting too philosophical, working in an ecosystem with libraries is something that benefits the entire ecosystem, not just one project. Giving up on libraries, and implementing everything oneself, is bad for Free Software, and quite often results in unmaintainable code. Slower publication of features or bug fixes is a cost of this, but it's a cost well worth paying.
From a less philosophical viewpoint, there may be questions of scope when it comes to specific features. The one that's been mentioned here is the ability to get a file stream order from demuxer elements. I think it's not obvious that this is outside of GStreamer's scope—not least because there's no way for us to get that information without leaning on internals—but discussion with GStreamer developers may lead to the conclusion that it is outside their scope, and if so, we'll do what we can.
But we are decoding. We're just decoding to specific formats.
We don't want to decode. There's a similar black box element for demuxing only, `parsebin`, which is another hint that `decodebin` should not be used if you don't want to decode. Though `parsebin` suffers from the same issues than `decodebin` and isn't usable for our specific use case.
I don't understand this at all. wg_parser isn't designed around decodebin, and it never has been.
Sure, there are indeed other uses of wg_parser with simple demuxers elements. However, the complexity of the `wg_parser` code is only there because of `decodebin`, because it is best used in pull mode and because it is inherently multi-threaded. It could be much simpler if `decodebin` was not supported.
If you see any problems with wg_parser, can you please share them?
It is (IMHO) unnecessarily complex for a simple demuxing task, requires a separate read thread (which is the main reason it is so complex and prone to concurrency issues), and requires to use pthread condition variables which are broken in Wine.
It is also unable to do typefinding only (which this MR does, though arguably this could be added), to later decide what kind of demuxing element needs to be created.
I understand being frustrated with upstream libraries, and I gather you've had some poor experiences with libraries, but I don't think that this is the right answer.
GStreamer is not meant to implement Windows behavior, Wine is. If GStreamer can be used to implement Windows behavior, great, and fact is, it probably can. It provides enough flexibility to let us decide which pieces to use and we should use the smaller ones as they are more easily combined, not the bigger piece that we have to shoehorn in place.
But we are decoding. We're just decoding to specific formats.
We don't want to decode. There's a similar black box element for demuxing only, `parsebin`, which is another hint that `decodebin` should not be used if you don't want to decode. Though `parsebin` suffers from the same issues than `decodebin` and isn't usable for our specific use case.
I think you're taking "decode" in the wrong sense. The way I'm using the term, and I think the use in the name "decodebin", is not meant to be opposed with "demux", but rather with "encode". (I also don't think contrasting decodebin with parsebin in that way makes much sense, since decodebin does everything that parsebin does.)
Perhaps more importantly, I've been trying to communicate why decodebin's design makes sense for what we want to do in Wine. The name is only one part of that. The fact that the API for decodebin explicitly has support for stopping at a compressed format suggests to me that outputting compressed formats is completely in line with what decodebin is designed to do.
I don't understand this at all. wg_parser isn't designed around decodebin, and it never has been.
Sure, there are indeed other uses of wg_parser with simple demuxers elements. However, the complexity of the `wg_parser` code is only there because of `decodebin`, because it is best used in pull mode and because it is inherently multi-threaded. It could be much simpler if `decodebin` was not supported.
If you see any problems with wg_parser, can you please share them?
It is (IMHO) unnecessarily complex for a simple demuxing task, requires a separate read thread (which is the main reason it is so complex and prone to concurrency issues),
I've heard mention several times of "threading issues" or "concurrency issues", but I haven't yet heard an explanation of what these issues are. Can you please explain what threading problems decodebin has?
Perhaps more importantly, though, I'm afraid that getting rid of the read thread isn't as simple as avoiding decodebin. In fact, there is no code in decodebin that cares at all about scheduling modes. Rather, that's something that individual demuxers and parsers care about.
And unfortunately, we can't just get rid of pull mode. While many demuxers and parsers for common formats shipped with GStreamer support push mode, as of 1.20 some do not (sfdec, musepackdec); some support push mode but cannot seek (rmdemux, midiparse, modplug, and any libav-based demuxer), and some support push mode but at the expense of some features (mxfdemux at least, and possibly others). And that's just elements shipped with GStreamer. A notable feature of the library is its extensibility (which is not just theoretical!), and removing pull support could cripple any number of unknown elements.
And, even if they are functional, I think few if any demuxers are going to be efficient, and likely cannot seek very accurately. [1]
That aside, while our need to support pull mode unquestionably makes the code more complex, I'm not aware of any actual bugs that result from it, or design problems. Can you please clarify what problem you're encountering with pull mode? I'd also like to know how you intend to deal with seeking in a way that avoids this problem, whatever it is; if the problem inheres to having an extra thread (as you seem to suggest) then I believe that the need for seeking will mean that that thread can't just be removed even if we *were* to stop supporting pull mode.
[1] Some formats can also put vital metadata at the end of the file. I am aware of one such format that I have seen this happen in practice with, namely QuickTime / MPEG-4. I don't know how GStreamer's qtdemux (or any other elements) deal with this. Do they buffer the entire file? Seek? Neither one is ideal.
and requires to use pthread condition variables which are broken in Wine.
Can you please clarify this statement? In what way are condition variables broken in Wine?
I'm aware that if a thread is terminated while waiting on a condition variable, it will fail quite noisily [2]. I assume this isn't what you mean, though, since a terminated thread doesn't have particularly defined behaviour anyway, and simply printing a few more messages to the terminal before exiting does not seem to be causing any real harm—hence there is not really anything "broken" about condition variables. Perhaps I am wrong about this? Or do you indeed mean some other problem?
[2] https://bugs.winehq.org/show_bug.cgi?id=52213
It is also unable to do typefinding only (which this MR does, though arguably this could be added), to later decide what kind of demuxing element needs to be created.
I understand being frustrated with upstream libraries, and I gather you've had some poor experiences with libraries, but I don't think that this is the right answer.
GStreamer is not meant to implement Windows behavior, Wine is. If GStreamer can be used to implement Windows behavior, great, and fact is, it probably can. It provides enough flexibility to let us decide which pieces to use and we should use the smaller ones as they are more easily combined, not the bigger piece that we have to shoehorn in place.
Sure, and I'm not by any means arguing that any given feature *should* be implemented in GStreamer just because it can. Nor am I taking it as a given that the feature that has been mentioned (file stream order) will be considered by the GStreamer developers to be within its scope. But, like I was saying, I think it very well may be within GStreamer's scope, and if it is, then we should delegate it to GStreamer.
Fundamentally, if I encounter a piece of code that requires us to completely rewrite the way we do something in Wine, but would require a lot less from GStreamer, I'm going to naturally ask myself if that something should be GStreamer's responsibility. Cases like this seem like the opposite of shoehorning.
By the way, I asked earlier in the thread, but I don't think I got an answer: what part(s) of the current code cause wg_parser initialization to be too slow? What are the bottlenecks, so to speak?
I think you're taking "decode" in the wrong sense. The way I'm using the term, and I think the use in the name "decodebin", is not meant to be opposed with "demux", but rather with "encode". (I also don't think contrasting decodebin with parsebin in that way makes much sense, since decodebin does everything that parsebin does.)
I'm taking it in the GStreamer sense. "parsebin unpacks the contents of the input stream to the level of parsed elementary streams, **but unlike decodebin it doesn't connect decoder elements**."
If decodebin was meant to be used like that there would be no need for them to offer a separate parsebin.
By the way, I asked earlier in the thread, but I don't think I got an answer: what part(s) of the current code cause wg_parser initialization to be too slow? What are the bottlenecks, so to speak?
I already detailed this previously, but basically everything about it. Its more complicated initialization, pull mode which triggers a very different typefind pattern with a lot of unnecessary reads, auto-plugging which needs to lookup decoders, prerolling.
I'd also like to know how you intend to deal with seeking in a way that avoids this problem, whatever it is; if the problem inheres to having an extra thread (as you seem to suggest) then I believe that the need for seeking will mean that that thread can't just be removed even if we *were* to stop supporting pull mode.
Nothing in seeking support mandates an extra thread. The demuxer sends seek events to the source pad when pushing buffers, and expects the next buffer to be read from the desired offset.
And unfortunately, we can't just get rid of pull mode. While many demuxers and parsers for common formats shipped with GStreamer support push mode, as of 1.20 some do not (sfdec, musepackdec); some support push mode but cannot seek (rmdemux, midiparse, modplug, and any libav-based demuxer), and some support push mode but at the expense of some features (mxfdemux at least, and possibly others).
I suggest we start by implementing the natively supported formats, which applications are using, the right way before worrying about more exotic formats. If these demuxers don't support push mode I'd say it's a bug on their side, and I would agree that *this* is worth trying to fix upstream. Most of these are in the Bad or Ugly plugins, which probably means they aren't very good quality.
I think you're taking "decode" in the wrong sense. The way I'm using the term, and I think the use in the name "decodebin", is not meant to be opposed with "demux", but rather with "encode". (I also don't think contrasting decodebin with parsebin in that way makes much sense, since decodebin does everything that parsebin does.)
I'm taking it in the GStreamer sense. "parsebin unpacks the contents of the input stream to the level of parsed elementary streams, **but unlike decodebin it doesn't connect decoder elements**."
If decodebin was meant to be used like that there would be no need for them to offer a separate parsebin.
Certainly "decode" has a specific, limited meaning in the world of multimedia, one that is mutually exclusive with "demux". However, "decoding" in that limited meaning is not what decodebin does; it demuxes *and* decodes, and it is specifically designed to do both of those things. "decode" is also often used in the world of multimedia to refer to that process. And despite its ambiguity, I can't think of a better word for it.
That said, though, I'd like to again reiterate that I think getting too hung up on the naming is a mistake. Even if there were a better name for decodebin, GStreamer is stuck with it now. If you're concerned about whether decodebin is a good fit for our purposes, I would advise rather to consider how the API is designed. decodebin is specifically built to be able to stop decoding anywhere, not just at raw formats, which is why I don't see any conceptual mismatch with our goals.
parsebin provides a very simple way to stop right after the demuxer. If that was our goal with the media source [and parsebin was introduced earlier than a couple of versions ago], then I would certainly advocate for using parsebin over decodebin, on the grounds that it's a better fit.
However, I don't think that is our goal. As I describe below, I think that we need to put a strong weight on preserving decoding support for *all* containers and codecs, not just the ones Windows supports, which means that the media source cannot always output a compressed format. But also, I think it would be a mistake to output *every* uncompressed format all at once. I think we should rather progressively build up a list of formats we know can be decoded, one at a time, both to avoid regressions and to make them more bisectable. decodebin is *specifically* built to do this; this is what the autoplug-continue signal does, and why it provides the stream caps.
By the way, I asked earlier in the thread, but I don't think I got an answer: what part(s) of the current code cause wg_parser initialization to be too slow? What are the bottlenecks, so to speak?
I already detailed this previously, but basically everything about it. Its more complicated initialization, pull mode which triggers a very different typefind pattern with a lot of unnecessary reads, auto-plugging which needs to lookup decoders, prerolling.
I take it that you haven't done any measurements of the specific parts, then? Or have you, and I simply missed it—could you please point me to where that was done in that case?
I ask because, while I can see how some of those parts of the initialization process may be slow, I can also imagine that some parts may not be. And while it's not clear to me why any of these can't be solved within the design of decodebin and wg_parser, it also seems that some parts are easier to solve than others, so it's useful to have that context when trying to evaluate the assertion that we need to throw away decodebin and wg_parser.
The statement about pull mode triggering unnecessary reads is one I don't recall being made before, and I find it very surprising. I would expect that pull mode categorically requires *less* data to be read, not more. The only reason I can imagine for pull mode being slower is that it'd emit more small read calls (and these would be slow due to I/O overhead and/or crossing the PE/Unix boundary), but we have a caching mechanism that's specifically meant to solve that problem (e6e7c7916d53). Can you please describe, in detail, what problem you're encountering with pull mode?
I also don't know what "more complicated initialization" is supposed to mean; could you please clarify?
I'd also like to know how you intend to deal with seeking in a way that avoids this problem, whatever it is; if the problem inheres to having an extra thread (as you seem to suggest) then I believe that the need for seeking will mean that that thread can't just be removed even if we *were* to stop supporting pull mode.
Nothing in seeking support mandates an extra thread. The demuxer sends seek events to the source pad when pushing buffers, and expects the next buffer to be read from the desired offset.
I'm sorry, I realize now I was assuming a certain design. Would you mind providing a brief overview of how you are pushing data?
And unfortunately, we can't just get rid of pull mode. While many demuxers and parsers for common formats shipped with GStreamer support push mode, as of 1.20 some do not (sfdec, musepackdec); some support push mode but cannot seek (rmdemux, midiparse, modplug, and any libav-based demuxer), and some support push mode but at the expense of some features (mxfdemux at least, and possibly others).
I suggest we start by implementing the natively supported formats, which applications are using, the right way before worrying about more exotic formats.
In a vacuum I would support this suggestion, but as things stand there are three problems.
The first problem is that we *already* support any format that the host GStreamer supports. Removing that support would be a regression. That doesn't mean that it's absolutely off the table, but it does mean that it needs a very strong argument.
The second is that, well, if we can support all formats easily then I think we should. As far as I can tell that's the case—pull mode may make things more complicated, but it doesn't make them intractable. Now, as I've been saying, I could be wrong about this; you seem to have found some intractable problem with pull mode, but I need to understand what that problem is.
The third problem is that getting rid of pull mode doesn't just affect those demuxers that don't support push mode; it affects demuxers that *do* as well. Pull mode is inherently more efficient, in terms of CPU time and memory usage. Some demuxers take a heavy penalty to one or both if they can't operate in pull mode.
If these demuxers don't support push mode I'd say it's a bug on their side, and I would agree that *this* is worth trying to fix upstream. Most of these are in the Bad or Ugly plugins, which probably means they aren't very good quality.
But... it's not. There's nothing in the design or specification of GStreamer that requires every element to support push mode. There's *certainly* nothing that requires every element to support push mode as efficiently and featurefully as pull mode. And in practice any parser is going to expect to be plugged into a file source, or something that acts like a file source, and that means it's wholly reasonable for them to expect pull mode to work.
The fundamental fact is that not all formats, and especially containers, are designed for the push mode model. Some require data at the end of the file to be read before doing anything with data at the beginning. [3] Some store each stream as a single contiguous string of bytes rather than interleaving them, which means that seeking to a time can't exactly be meaningfully translated into seeking to a byte offset. These formats may be able to support push mode with enough hammering, and that's been done for some builtin GStreamer elements, but that doesn't mean it's going to work well.
[3] This has obvious disadvantages—not just in terms of playback, but also in terms of file stability. I've had to deal with MPEG-4 files that got accidentally truncated and were as a result unusable. But there are also good reasons for this: if you're recording a long stream, you want to be able to stream audio/video chunks directly to disk, and then build things like indexes once you're done streaming. That means you have to put those indexes at the end.
To be clear, I am not, and at no point have been, trying to say "I have heard and understood your concerns and rejected them, I think we should use decodebin and wg_parser". I am not saying this specifically because I have *not* heard and understood your concerns. I have only heard generic statements like "this doesn't work" or "this causes problems". I really need to understand what the specific problems are.
To that end, I'd like to repeat some questions I posed, that I don't think were answered:
* What problems result from having a read thread, or from pull mode? (I am not sure if you are referring to the same problems, but I assume so.)
* What problems are there with condition variables? (From your earlier replies I understand that condition variables are not the only problem with pull mode, but perhaps I misread?)
* Pull mode aside (see the question I posed above), are any of the slow parts of parser initialization impossible to solve with decodebin? I don't see any way in which they are; in fact, I don't see any way in which they'd even require changes to the wg_parser API. But I must be missing something.
* What conceptual problems are there with wg_parser? If it's just the read thread (but I feel like you've asserted there are other problems), can we replace that part of wg_parser instead of rewriting it from scratch? If it's a matter of the conceptual design behind wg_parser, have I adequately addressed your concern by explaining the idea behind it, or do you still have strong objections?
I take it that you haven't done any measurements of the specific parts, then? Or have you, and I simply missed it—could you please point me to where that was done in that case?
I did, although it was a long time ago. Doing it again, when native takes 0-3ms to resolve a media source entirely, `wg_parser_connect` takes ~80ms on average, on the same beefy machine though Windows is running inside a VM, with high variations.
* `gst_element_set_state` / `gst_element_get_state` alone take 70ms on average, which high variation.
* Waiting for no-more-pads adds 5ms or more to that.
The wg_source implemented in the branch mentioned above does the same thing (resolving a mp4 video streams with all the streams enumerated) in ~1ms.
The statement about pull mode triggering unnecessary reads is one I don't recall being made before, and I find it very surprising. I would expect that pull mode categorically requires *less* data to be read, not more. The only reason I can imagine for pull mode being slower is that it'd emit more small read calls (and these would be slow due to I/O overhead and/or crossing the PE/Unix boundary), but we have a caching mechanism that's specifically meant to solve that problem (e6e7c7916d53). Can you please describe, in detail, what problem you're encountering with pull mode?
Pull mode typefind triggers plenty of 4096 read requests, only shifted by a small number of bytes. This is not specific to decodebin and I encountered the same behavior when trying to use typefind plugin in the wg_source, in pull mode. The cache helps a bit, but it still takes ~5ms to satisfy all the requests, especially as it also reads elsewhere in the stream.
I don't know the exact reason and I don't think I should spend much more time on this, as using `gst_type_find_helper_for_data_with_extension` instead is enough to figure the mime type for all the formats I tested with, and is happy with the first chunk of data.
I ask because, while I can see how some of those parts of the initialization process may be slow, I can also imagine that some parts may not be. And while it's not clear to me why any of these can't be solved within the design of decodebin and wg_parser, it also seems that some parts are easier to solve than others, so it's useful to have that context when trying to evaluate the assertion that we need to throw away decodebin and wg_parser.
After typefinding there's then the autoplugging, which creates decoder elements (be it the first decodebin, or the second decodebin we may add), and request more data for their initialization. Reading data takes some times, as well as having the decoders initialize with the first buffers.
What problems result from having a read thread, or from pull mode? (I am not sure if you are referring to the same problems, but I assume so.)
I think there's now about a dozen of people who've been fighting or are still fighting with deadlocks in the media source. I'm not saying that race conditions are easy to avoid, but it seems to me that a thread isn't really necessary here, and that it would avoid that class of problems.
I'm sorry, I realize now I was assuming a certain design. Would you mind providing a brief overview of how you are pushing data?
It's in the branch mentioned above, basically call `wg_source_push_data` / `wg_source_get_position` in a loop. Pushing data (or later, seeking) may cause the demuxer to send a seek event to our source pad, which we'll receive synchronously before returning, and the next `wg_source_get_position` will tell where we need to read the next chunk of data from. https://gitlab.winehq.org/rbernon/wine/-/commit/fe1ed0e07cf8b2b88380813d3818...
What problems are there with condition variables? (From your earlier replies I understand that condition variables are not the only problem with pull mode, but perhaps I misread?)
This has been replied and further discussed on https://gitlab.winehq.org/wine/wine/-/merge_requests/3737#note_44725. I understand that it's a radical solution but I think we should avoid using pthread condition variables entirely, until we have a fix. I've made several proposal to that end, in https://gitlab.winehq.org/wine/wine/-/merge_requests/1088, but none have been approved (and I find resorting to SIGKILL much uglier).
Pull mode aside (see the question I posed above), are any of the slow parts of parser initialization impossible to solve with decodebin? I don't see any way in which they are; in fact, I don't see any way in which they'd even require changes to the wg_parser API. But I must be missing something.
Auto-plugging is time consuming, not needed and not desired: we don't want decoders to fully initialize and start decoding buffers during media source resolution. MF has its own pipeline resolution done elsewhere, which we should use instead.
What conceptual problems are there with wg_parser? If it's just the read thread (but I feel like you've asserted there are other problems), can we replace that part of wg_parser instead of rewriting it from scratch? If it's a matter of the conceptual design behind wg_parser, have I adequately addressed your concern by explaining the idea behind it, or do you still have strong objections?
Like I previous said, it would be much simpler to work on one API at a time. I don't see why all the decoding API *have* to use the same internal interface forever. Having MF start using a different approach will let us improve it, figure the problems and requirements inherent to having compressed buffers passing through, progressively, and without any risk of breaking DirectShow and WMReader at the same time.
It will also avoid adding more complexity to the wg_parser, which I would have otherwise to, if I were to introduce a separate mean of pushing data than the read thread. I find it already enough complicated, with features that are unused but which we have to work with (push mode thread for instance), and I would prefer to work with something simpler from the beginning, doing only the demuxing task with no added feature.
Last, I don't think that adding two decodebin one after another, and later hooking them with probes to extract buffers from the pipeline is the right way of doing this. Yes, it is possible and GStreamer has APIs to do this kind of things, but there's a much simpler way.
I take it that you haven't done any measurements of the specific parts, then? Or have you, and I simply missed it—could you please point me to where that was done in that case?
I did, although it was a long time ago. Doing it again, when native takes 0-3ms to resolve a media source entirely, `wg_parser_connect` takes ~80ms on average, on the same beefy machine though Windows is running inside a VM, with high variations.
- `gst_element_set_state` / `gst_element_get_state` alone take 70ms on average, which high variation.
- Waiting for no-more-pads adds 5ms or more to that.
The wg_source implemented in the branch mentioned above does the same thing (resolving a mp4 video streams with all the streams enumerated) in ~1ms.
Ah! Thanks, that's interesting. So it looks like autoplugging is the big problem. I'm a bit confused about "no-more-pads", since at least here, decodebin will actually emit no-more-pads before resolving the state change, but maybe that's different in different versions? Or did you mean prerolling?
The statement about pull mode triggering unnecessary reads is one I don't recall being made before, and I find it very surprising. I would expect that pull mode categorically requires *less* data to be read, not more. The only reason I can imagine for pull mode being slower is that it'd emit more small read calls (and these would be slow due to I/O overhead and/or crossing the PE/Unix boundary), but we have a caching mechanism that's specifically meant to solve that problem (e6e7c7916d53). Can you please describe, in detail, what problem you're encountering with pull mode?
Pull mode typefind triggers plenty of 4096 read requests, only shifted by a small number of bytes. This is not specific to decodebin and I encountered the same behavior when trying to use typefind plugin in the wg_source, in pull mode. The cache helps a bit, but it still takes ~5ms to satisfy all the requests, especially as it also reads elsewhere in the stream.
I don't know the exact reason and I don't think I should spend much more time on this, as using `gst_type_find_helper_for_data_with_extension` instead is enough to figure the mime type for all the formats I tested with, and is happy with the first chunk of data.
Ah! That's a nice find.
I was curious enough myself to look more into this. Most typefinding elements, although not all, are indeed happy with the first page. Providing the extension also speeds up the process dramatically, because typefind will then try the callback associated with that extension first, and that will pretty much always succeed immediately.
The upshot is that we *can* in fact take advantage of this within the GStreamer pipeline, by responding to the "uri" query.
What problems result from having a read thread, or from pull mode? (I am not sure if you are referring to the same problems, but I assume so.)
I think there's now about a dozen of people who've been fighting or are still fighting with deadlocks in the media source. I'm not saying that race conditions are easy to avoid, but it seems to me that a thread isn't really necessary here, and that it would avoid that class of problems.
Yeah, threading is hard. (Although I suspect that this is broadly the fault of horribly underspecified and overcomplicated APIs like DirectShow and Media Foundation. wg_parser is supposed to have a simple threading API; this was an explicit design goal I had for it.)
Make no mistake, if we could get rid of the read thread, I'd be happy, that would allow us to remove a fair amount of code. But I don't see a way to avoid it as long as we're supporting pull mode, and I don't think cutting support for pull mode is a reasonable option at this point.
I'm sorry, I realize now I was assuming a certain design. Would you mind providing a brief overview of how you are pushing data?
It's in the branch mentioned above, basically call `wg_source_push_data` / `wg_source_get_position` in a loop. Pushing data (or later, seeking) may cause the demuxer to send a seek event to our source pad, which we'll receive synchronously before returning, and the next `wg_source_get_position` will tell where we need to read the next chunk of data from. https://gitlab.winehq.org/rbernon/wine/-/commit/fe1ed0e07cf8b2b88380813d3818...
How do you push data after initialization, i.e. while streaming?
What problems are there with condition variables? (From your earlier replies I understand that condition variables are not the only problem with pull mode, but perhaps I misread?)
This has been replied and further discussed on https://gitlab.winehq.org/wine/wine/-/merge_requests/3737#note_44725. I understand that it's a radical solution but I think we should avoid using pthread condition variables entirely, until we have a fix. I've made several proposal to that end, in https://gitlab.winehq.org/wine/wine/-/merge_requests/1088, but none have been approved (and I find resorting to SIGKILL much uglier).
Yeah, I see where you're coming from, but I don't think that "don't use condition variables" is the right answer. [And really, even if it was, condition variables don't inhere to wg_parser. We could do synchronization with NT APIs instead if we really wanted to.]
The way I read Alexandre's answer to 1088, it hasn't been accepted because avoiding the crashes just isn't worth the extra complication. I'm inclined to agree with that assessment, because the crashes don't do any more harm than printing errors to the console. Along similar lines, I think that avoiding the crashes isn't worth the extra complication that results from not using libc functions.
SIGKILL is ugly, sure, but it's ugly because we're killing processes. It's fundamentally no uglier than TerminateProcess().
Pull mode aside (see the question I posed above), are any of the slow parts of parser initialization impossible to solve with decodebin? I don't see any way in which they are; in fact, I don't see any way in which they'd even require changes to the wg_parser API. But I must be missing something.
Auto-plugging is time consuming, not needed and not desired: we don't want decoders to fully initialize and start decoding buffers during media source resolution.
Ah, sure, but we also don't *need* to autoplug immediately. I don't know if I described this idea before, but what we can do is defer stream autoplugging to later. This should be quite simple, just a matter of moving some code from pad_added_cb() to a function lazily called by wg_parser_stream_enable() or wg_parser_stream_get_preferred_format() [I think those are the only places that matter].
More importantly, though, if we're going to be outputting compressed formats for those videos that need fast media source initialization, then we don't even need to worry about autoplugging overhead, because we won't need to autoplug. Currently we do anyway, in order to support toggling at runtime between encoded and unencoded video, but I have my doubts about the utility of that feature.
MF has its own pipeline resolution done elsewhere, which we should use instead.
Right, but that's a separate issue. If we're outputting compressed formats then we won't be autoplugging anyway.
What conceptual problems are there with wg_parser? If it's just the read thread (but I feel like you've asserted there are other problems), can we replace that part of wg_parser instead of rewriting it from scratch? If it's a matter of the conceptual design behind wg_parser, have I adequately addressed your concern by explaining the idea behind it, or do you still have strong objections?
Like I previous said, it would be much simpler to work on one API at a time. I don't see why all the decoding API *have* to use the same internal interface forever. Having MF start using a different approach will let us improve it, figure the problems and requirements inherent to having compressed buffers passing through, progressively, and without any risk of breaking DirectShow and WMReader at the same time.
Sure, that's understandable. But the right thing to do isn't always the least risky thing to do. I think in this case the right thing to do is keep the wg_parser interface, making incremental changes to it and its backends if necessary.
I'm not saying that the wg_parser interface is perfect and doesn't need changes, far from it. There are things I don't like about it currently, though I haven't yet come up with a satisfactory way to improve them. But those changes can be done without rewriting the whole thing.
Last, I don't think that adding two decodebin one after another, and later hooking them with probes to extract buffers from the pipeline is the right way of doing this. Yes, it is possible and GStreamer has APIs to do this kind of things, but there's a much simpler way.
I don't much like the probes either, although when consulting myself I determined I don't have a better objection than "well, it's not the way a normal pipeline is supposed to work". But it works, it's the intended purpose of probes, and it doesn't cause any problems.
That said, the probes are only there to support toggling between compressed and uncompressed streams at runtime, which has always felt like a feature of questionable utility, and I wouldn't object if it was just removed.
The way I read Alexandre's answer to 1088, it hasn't been accepted because avoiding the crashes just isn't worth the extra complication. I'm inclined to agree with that assessment, because the crashes don't do any more harm than printing errors to the console. Along similar lines, I think that avoiding the crashes isn't worth the extra complication that results from not using libc functions.
Actually I very much think we should avoid the crashes, but I'd rather do that by not using troublesome Unix functions. In particular, creating threads on the Unix side is ugly and should be avoided at all costs IMNSHO.
SIGKILL is ugly, sure, but it's ugly because we're killing processes. It's fundamentally no uglier than TerminateProcess().
We can't use SIGKILL because it doesn't allow setting the exit code.
I understand that, and the latest version of !1088 doesn't create a new thread but instead parks aborted thread around (using an indefinite and uninterruptible `select( 0, 0, 0, 0, 0 );`) until the process fully exits.
Do you think this is still ugly, or should I pursue this lead?
I understand that, and the latest version of !1088 doesn't create a new thread but instead parks aborted thread around (using an indefinite and uninterruptible `select( 0, 0, 0, 0, 0 );`) until the process fully exits.
Do you think this is still ugly, or should I pursue this lead?
My remark was actually aimed at winegstreamer and other libraries that create Unix threads. We should avoid that as far as possible.
As far as avoiding the crashes, it seems to me that now that the syscall interface is in place it should be possible to properly unwind the kernel stack.
My remark was actually aimed at winegstreamer and other libraries that create Unix threads. We should avoid that as far as possible.
Is there a reason for this besides killing threads not (currently) working?
As far as avoiding the crashes, it seems to me that now that the syscall interface is in place it should be possible to properly unwind the kernel stack.
We can't avoid GStreamer creating threads.
I don't think that GStreamer is the problem behind bug 52213, though. The crashes I've seen in practice have all been from winealsa (or other sound libraries?), and I have some patches that should let those threads exit cleanly. I haven't sent them yet since the mmdevapi rework is still in progress.
Is there a reason for this besides killing threads not (currently) working?
Unix threads are a PITA. They can interfere with the rest of the app, but they can't be debugged, they can't be suspended or killed, they can't handle signals, they can't use debug traces, etc.
Ah! Thanks, that's interesting. So it looks like autoplugging is the big problem. I'm a bit confused about "no-more-pads", since at least here, decodebin will actually emit no-more-pads before resolving the state change, but maybe that's different in different versions? Or did you mean prerolling?
I actually meant the part after the state change, where we wait for no-more-pads and for the stream duration. So that probably involves some prerolling yes.
How do you push data after initialization, i.e. while streaming?
The same way. Get the read position, read and push data until we get the samples we want. We don't need a thread here, especially not in MF case where everything is or can be done asynchronously already, including the reads.
Ah, sure, but we also don't *need* to autoplug immediately. I don't know if I described this idea before, but what we can do is defer stream autoplugging to later. This should be quite simple, just a matter of moving some code from pad_added_cb() to a function lazily called by wg_parser_stream_enable() or wg_parser_stream_get_preferred_format() [I think those are the only places that matter].
More importantly, though, if we're going to be outputting compressed formats for those videos that need fast media source initialization, then we don't even need to worry about autoplugging overhead, because we won't need to autoplug. Currently we do anyway, in order to support toggling at runtime between encoded and unencoded video, but I have my doubts about the utility of that feature.
We still need it because we put everything in the same pipeline. Building the pipeline lazily by adding some decoding elements later will require it to be stopped and started again. This adds even more complexity to wg_parser only for the purpose of making it optionally do something simpler. Instead, we should do always something simple, by decoupling the demuxing from the decoding, like it's also supposed to be.
Sure, that's understandable. But the right thing to do isn't always the least risky thing to do. I think in this case the right thing to do is keep the wg_parser interface, making incremental changes to it and its backends if necessary.
I'm not saying that the wg_parser interface is perfect and doesn't need changes, far from it. There are things I don't like about it currently, though I haven't yet come up with a satisfactory way to improve them. But those changes can be done without rewriting the whole thing.
Maybe it's not satisfactory because it tries to do everything at the same time. I don't think that's a good idea, and instead we should compose simple elements together. For MF, the composition is already done on the PE side because we needed it, so now it's just a matter of implementing the simple demuxer element.
How do you push data after initialization, i.e. while streaming?
The same way. Get the read position, read and push data until we get the samples we want. We don't need a thread here, especially not in MF case where everything is or can be done asynchronously already, including the reads.
Ah, so you do it from RequestSample() [well, wait_on_sample() really, but anyway.]
So it's worth noting that if that approach works, there's actually nothing preventing you from doing the same thing for pull mode. It would require API change, because now you need to wait for *either* a getrange *or* a sample, but it's possible. It would probably mean a return to something closer to the old winegstreamer design, which was a more generic form of RPC that marshalled all manner of internal calls onto the main thread.
Do we want this change? Well... maybe? I'm not sure that just getting rid of a thread is going to reduce complexity, but I'd have to see the code.
One problem, though, is that now you can't buffer as easily. I think this is less of a problem for mfplat, which has a weird API that results in always buffering anyway, but it's definitely a problem for quartz (and wmvcore for that matter, but see below). If all the streaming threads are currently in Receive() downstream, nothing will be processing getrange requests upstream, which means that as soon as we have a sample available we may not be buffering any more. This may or may not be a problem in practice, it depends on a lot of unknowns, but it at least makes me nervous.
Another thing worth mentioning is that native quartz doesn't work that way. I haven't checked all parsers, but mpegsplit at least creates a read thread internally. We could change native quartz to avoid the read thread anyway, but... I hesitate to move away from native behaviour for what right now seems like a questionable change in code simplicity.
(On the other hand, native wmvcore *does* work that way—it doesn't seem to have a separate read thread. So that's something to consider if anything changes.)
So while getting rid of the read thread would make things simpler in some ways, it seems a questionable benefit to me if we can't get rid of it everywhere. It's still a problem that has to be solved, and once it's solved in one frontend the solution can be (and is!) copy-pasted to the others.
Actually, along those lines, one thing I've considered, but never gotten around to doing, is writing some common PE-side code that would take care of the read thread abstraction, including abstracting create/connect and disconnect/destroy as a single function (since right now those only exist in two pieces for the sake of the read thread).
Ah, sure, but we also don't *need* to autoplug immediately. I don't know if I described this idea before, but what we can do is defer stream autoplugging to later. This should be quite simple, just a matter of moving some code from pad_added_cb() to a function lazily called by wg_parser_stream_enable() or wg_parser_stream_get_preferred_format() [I think those are the only places that matter].
More importantly, though, if we're going to be outputting compressed formats for those videos that need fast media source initialization, then we don't even need to worry about autoplugging overhead, because we won't need to autoplug. Currently we do anyway, in order to support toggling at runtime between encoded and unencoded video, but I have my doubts about the utility of that feature.
We still need it because we put everything in the same pipeline. Building the pipeline lazily by adding some decoding elements later will require it to be stopped and started again. This adds even more complexity to wg_parser only for the purpose of making it optionally do something simpler. Instead, we should do always something simple, by decoupling the demuxing from the decoding, like it's also supposed to be.
Indeed you're right, I was under the mistaken impression that you could hotplug in GStreamer. While gst_pad_unlink() allows it (unlike the equivalent function in DirectShow), the parser will subsequently get GST_FLOW_RETURN and yield an error. Even if things work anyway after that, that's not something we should depend on.
We can still just skip demuxing for those formats, though, and I imagine that'll be enough. We needed to do that for other reasons, and that happily should solve the application bug for free.
Sure, that's understandable. But the right thing to do isn't always the least risky thing to do. I think in this case the right thing to do is keep the wg_parser interface, making incremental changes to it and its backends if necessary.
I'm not saying that the wg_parser interface is perfect and doesn't need changes, far from it. There are things I don't like about it currently, though I haven't yet come up with a satisfactory way to improve them. But those changes can be done without rewriting the whole thing.
Maybe it's not satisfactory because it tries to do everything at the same time. I don't think that's a good idea, and instead we should compose simple elements together. For MF, the composition is already done on the PE side because we needed it, so now it's just a matter of implementing the simple demuxer element.
I completely understand why it feels like wg_parser is trying to do everything at once—after all, it wraps a whole GstPipeline, it autoplugs decoders and demuxers, and in general it's not a simple piece of code.
The problem is that you can't really just break up wg_parser. Almost all of the complexity of wg_parser (including pull mode) really inheres to the *parser* side of things. The only real difference between a parser and a complete pipeline is the specific formats they output. You can really tell this by comparing avi_parser_init_gst() to decodebin_parser_init_gst().
If we were, hypothetically, to remove decodebin support from wg_parser, the only thing we'd be able to simplify is that we'd be able to remove videoconvert and audioconvert, and get rid of the logic that allows us to pick the output format from sink_query_cb(). That's all. [And that's assuming that demuxers never output multiple formats, which... I assume is the case, but I don't know for sure.]
There is one piece of wg_parser that can be broken off in some sense, which is the stream decodebin. And, as I've mentioned, my inclination is that we should indeed break this off. I don't think it's solving a problem that matters, and frankly, even if it was, I'd prefer a less iffy way to do it.
The problem is that you can't really just break up wg_parser.
Exactly so let's not, and let's do this using this new and separate interface that will not break every other API at the same time.
I'm in favor of going in the direction of this MR for the following reasons:
- We have quite a few bugs in the MF frontend, which directly emerge from us sending decoded samples into it. For example, some games expect the existence of a H264 decoder, in the frontend pipeline, which normally gets inserted during topology (pipeline) resolving, but in Wine, we just skip this and short wire the source to the sink, because why would you want to decode decoded samples again, right? - I think the Gstreamer pipeline should resemble the frontend pipeline closer, even be mostly constructed by us, not by some code which need many workarounds because it really seems like its purpose is for higher level apps, which we aren't. This not only helps us remove more possible failure points but also, makes it much easier to follow what's happening to our data. The frontend already has a pipeline, so why not use it properly?
However, I don't think that is our goal. As I describe below, I think that we need to put a strong weight on preserving decoding support for *all* containers and codecs, not just the ones Windows supports
Why though? Wine is meant to, well, emulate Windows behavior, having additional features is nice, like you've stated in the talk on the Gstreamer conference, but having a feature, which we actually don't need to support and which blocks actual problems, makes me wonder if it's really worth preserving.
The first problem is that we *already* support any format that the host GStreamer supports. Removing that support would be a regression.
Well, it's not, it just removes a feature which shouldn't be there in the first place. If programs need support for exotic formats, they ship decoding DLLs or programs.
So while getting rid of the read thread would make things simpler in some ways, it seems a questionable benefit to me if we can't get rid of it everywhere.
How about handling this on the quartz PE side?
I think wg_transform is a good example of how most of winegstreamer should look like, concise, and only binding to a simple middle- or low-level GST element. IMHO, we also shouldn't do anything apart from video processing in it. (For example networking for Livestream sources, which will be limited by the fact that Windows supports certain features, which GST simply doesn't have or implements completely different and would just be easier done using winhttp/urlmon - plus, it avoids Unix code)
I'm in favor of going in the direction of this MR for the following reasons:
We have quite a few bugs in the MF frontend, which directly emerge from us sending decoded samples into it. For example, some games expect the existence of a H264 decoder, in the frontend pipeline, which normally gets inserted during topology (pipeline) resolving, but in Wine, we just skip this and short wire the source to the sink, because why would you want to decode decoded samples again, right?
I think the Gstreamer pipeline should resemble the frontend pipeline closer, even be mostly constructed by us, not by some code which need many workarounds because it really seems like its purpose is for higher level apps, which we aren't. This not only helps us remove more possible failure points but also, makes it much easier to follow what's happening to our data. The frontend already has a pipeline, so why not use it properly?
As I've tried to explain multiple times (e.g. the first three paragraphs of my first response to this patch series), I am not arguing against decoded sample output. I simply don't think it's necessary to rewrite all (or even any) of the winegstreamer glue to do it, especially not from scratch.
In fact, we already have an internal API for compressed sample output, and we use it in the quartz and wmvcore frontends. It should be quite easy to add that to mfplat as well.
However, I don't think that is our goal. As I describe below, I think that we need to put a strong weight on preserving decoding support for *all* containers and codecs, not just the ones Windows supports
Why though? Wine is meant to, well, emulate Windows behavior, having additional features is nice, like you've stated in the talk on the Gstreamer conference, but having a feature, which we actually don't need to support and which blocks actual problems, makes me wonder if it's really worth preserving.
It's not blocking anything. The design that I laid out and which is basically implemented now allows us to output known compressed formats while letting unknown formats still get decompressed to raw. This means that applications which care will get their expected compressed formats, whereas applications which accept arbitrary files can still interpret them.
So while getting rid of the read thread would make things simpler in some ways, it seems a questionable benefit to me if we can't get rid of it everywhere.
How about handling this on the quartz PE side?
Well, as I said, I'm not rejecting that design with prejudice, but I'm not convinced it'll be an improvement either.
I think wg_transform is a good example of how most of winegstreamer should look like, concise, and only binding to a simple middle- or low-level GST element. IMHO, we also shouldn't do anything apart from video processing in it. (For example networking for Livestream sources, which will be limited by the fact that Windows supports certain features, which GST simply doesn't have or implements completely different and would just be easier done using winhttp/urlmon - plus, it avoids Unix code)
This is the same high-level perception that's been repeated multiple times over the course of this conversation. As I've said, I understand how wg_parser seems like a large and complex piece of code, and I understand how it seems like it's doing more than it needs to. However, the only thing that's been given that it "doesn't need to" be doing is decoding (viz. in addition to demuxing), and even setting aside whether it's *actually* unnecessary, removing that feature from wg_parser improves very little.
As I've stated, almost all of the complexity of wg_parser (including pull mode) really inheres to the *parser* side of things, and it needs to be there. The logic to handle specific output formats is quite possibly the smallest part of it. It may seem that the existence of this patch series contradicts this statement, but this patch series is cutting out (by rewriting) parts of wg_parser that are there for a reason.
However, the only thing that's been given that it "doesn't need to" be doing is decoding
Functionally, sure, but my impression is that by looping in so many high level g-streamer components, (GstTypeFind, auto-plugging, decodebin in general), we're giving up control on what should be a tightly implemented de-muxing object. So exactly that seems to be the problem: for little funtional benefit (sometimes decoding, no realworld usecase afaik), we have to deal with the slow unwieldy decodebin object. Sure, it could be possible to trim-down the functionality of decodebin to be a demuxer + gst_type_find_helper_for_data_with_extension wrapper, but why, especially when it adds additonal complexity.
The design that I laid out and which is basically implemented now
Sorry if I missed this, but do you mean upstreamed, or is there another MR with this coming? As far as I looked the current implementation still waits for no-more-pads in parser_connect and has to go through the auto-plugging process. If you mean the possibility for returning compressed types is already there, that still leaves a major benefit of this MR out, which is the speed. Is it still our understanding that in order to accomplish the same speed of this lean wg_source interface, we'd have to introduce a mechanism to restart the pipeline creation process in case an uncompressed type is selected?
However, the only thing that's been given that it "doesn't need to" be doing is decoding
Functionally, sure, but my impression is that by looping in so many high level g-streamer components, (GstTypeFind, auto-plugging, decodebin in general), we're giving up control on what should be a tightly implemented de-muxing object. So exactly that seems to be the problem: for little funtional benefit (sometimes decoding, no realworld usecase afaik), we have to deal with the slow unwieldy decodebin object. Sure, it could be possible to trim-down the functionality of decodebin to be a demuxer + gst_type_find_helper_for_data_with_extension wrapper, but why, especially when it adds additonal complexity.
I'm starting to feel like I'm repeating myself and not being heard here. I don't know how else to say this.
While I understand and sympathize with this general impression, it's not actually going to help anything. *The complicated parts of wg_parser are not related to decodebin's autoplugging.* The complicated parts are all there for a reason, and that reason does not go away when you replace decodebin with a different element.
The design that I laid out and which is basically implemented now
Sorry if I missed this, but do you mean upstreamed, or is there another MR with this coming?
4592662d4 is the implementation, and 5c24b7e56b5 uses it in DirectShow. Using it in Media Foundation should be similarly trivial.
As far as I looked the current implementation still waits for no-more-pads in parser_connect and has to go through the auto-plugging process.
We will always need to wait for no-more-pads; that's just a consequence of using demuxers that support an arbitrary number of streams.
The current design for outputting compressed samples basically shortcuts the autoplugging process, stopping when we see a known compressed format. Cf. autoplug_continue_cb(). If we're *not* outputting compressed samples, we'll then go ahead and decode that compressed format the rest of the way; cf. stream_decodebin_create() as called from pad_added_cb().
If you mean the possibility for returning compressed types is already there, that still leaves a major benefit of this MR out, which is the speed. Is it still our understanding that in order to accomplish the same speed of this lean wg_source interface, we'd have to introduce a mechanism to restart the pipeline creation process in case an uncompressed type is selected?
Well, most of the slowness is due to autoplugging. By shortcutting the process we'll avoid that. Speeding up typefinding of the source format, which we sometimes already know or have clues towards, will also help. If there's any remaining meaningful difference after that (and I don't remember if there was right now) we should see if that's something we can address in GStreamer before trying to throw anything else out.
This merge request was closed by Rémi Bernon.