[PATCH 0/1] MR7288: winegstreamer: Avoid large buffer pushes in wg_transform.

List overview All Threads

newer

older

[PATCH v12 0/1] MR7400: cmd: Allow...

[PATCH 0/1] MR7418: wininet: Treat...

Yuxuan Shui (＠yshui)

7 Feb 2025 7 Feb '25

9:44 p.m.

## Context

This is a problem discovered when running "Max: The Curse of Brotherhood" under proton. This game writes 4MB a piece into `wg_transform`, which contains about 4 seconds of compressed video. `wg_transform` calls `gst_pad_push` in `transform_ProcessOutput`, which takes ~300ms (i.e. 20 frames @ 60fps) to decode this buffer of video. So the end result is the game hitches every 4 seconds while playing cut scene videos.

Proton currently has a special case for this particular game, for which it breaks up the buffer into small chunks to avoid long blocks. This MR adopts that and applies it generally.

One concern raised by @redmcg is:

...

The only issue I can think of is if there are any decoders which don't have a parser; then they might not know how to deal with an arbitrary 4096 byte buffer (the parser is generally responsible for taking arbitrary buffers and producing something the decoder can work with, for example: a full frame).

So this MR only enables this strategy when there is a parser element.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288

Show replies by date

Yuxuan Shui

7 Feb 7 Feb

9:44 p.m.

New subject: [PATCH 1/1] winegstreamer: Avoid large buffer pushes in wg_transform.

From: Yuxuan Shui yshui@codeweavers.com

--- dlls/winegstreamer/wg_transform.c | 44 ++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/dlls/winegstreamer/wg_transform.c b/dlls/winegstreamer/wg_transform.c index 558f6572a1b..50c1958013b 100644 --- a/dlls/winegstreamer/wg_transform.c +++ b/dlls/winegstreamer/wg_transform.c @@ -53,16 +53,18 @@ struct wg_transform GstQuery *drain_query;

GstAtomicQueue *input_queue; + GstBuffer *current_buffer; MFVideoInfo input_info; MFVideoInfo output_info;

GstAtomicQueue *output_queue; GstSample *output_sample; - bool output_caps_changed; GstCaps *desired_caps; GstCaps *output_caps; GstCaps *input_caps;

+ bool output_caps_changed; + bool has_parser; bool draining; };

@@ -492,8 +494,12 @@ static bool transform_create_decoder_elements(struct wg_transform *transform,

if ((element = find_element(GST_ELEMENT_FACTORY_TYPE_PARSER, transform->input_caps, parsed_caps)) && !append_element(transform->container, element, first, last)) + { + transform->has_parser = true; goto done; - else if (!element) + } + + if (!element) { gst_caps_unref(parsed_caps); parsed_caps = gst_caps_ref(transform->input_caps); @@ -1035,18 +1041,42 @@ error:

static bool get_transform_output(struct wg_transform *transform, struct wg_sample *sample) { - GstBuffer *input_buffer; GstFlowReturn ret;

wg_allocator_provide_sample(transform->allocator, sample);

- while (!(transform->output_sample = gst_atomic_queue_pop(transform->output_queue)) - && (input_buffer = gst_atomic_queue_pop(transform->input_queue))) + while (!(transform->output_sample = gst_atomic_queue_pop(transform->output_queue))) { - if ((ret = gst_pad_push(transform->my_src, input_buffer))) + /* A big buffer could take the transform a long time (longer than a vblank) to + * process, so if we write the entire input buffer in one go, we could block and + * cause application to hitch. So we break the buffer up, write just enough to + * get something back and stop. However, doing so is only safe if we are writing + * into a parser, as it knows to put things back up properly. */ + + gsize target_push_size = 4096, size, maxsize, offset; + GstBuffer *tmp_buffer = NULL; + if (!transform->current_buffer) + transform->current_buffer = gst_atomic_queue_pop(transform->input_queue); + if (!transform->current_buffer) + break; + + size = gst_buffer_get_sizes(transform->current_buffer, &offset, &maxsize); + if (size < target_push_size || !transform->has_parser) + target_push_size = size; + + tmp_buffer = gst_buffer_copy_region(transform->current_buffer, + GST_BUFFER_COPY_METADATA | GST_BUFFER_COPY_MEMORY, 0, target_push_size); + if ((ret = gst_pad_push(transform->my_src, tmp_buffer))) GST_WARNING("Failed to push transform input, error %d", ret);

- complete_drain(transform); + if (target_push_size < size) + gst_buffer_resize(transform->current_buffer, target_push_size, -1); + else + { + gst_buffer_unref(transform->current_buffer); + transform->current_buffer = NULL; + complete_drain(transform); + } }

/* Remove the sample so the allocator cannot use it */

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/7288

Elizabeth Figura (＠zfigura)

10 Feb 10 Feb

6:27 p.m.

What does native do?

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94165

Rémi Bernon (＠rbernon)

11 Feb 11 Feb

11:09 a.m.

I'm not sure we can tell what native does exactly here but IMO, and especially as we're doing this only if we have a parser, this should instead use the parser in a standalone way to split the data into individual packets with the correct boundaries. This could probably be done in push_data, then every parsed buffer would be queued instead of the input data itself.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94252

Yuxuan Shui (＠yshui)

5:12 p.m.

On Tue Feb 11 17:12:53 2025 +0000, Rémi Bernon wrote:

...

I'm not sure we can tell what native does exactly here but IMO, and especially as we're doing this only if we have a parser, this should instead use the parser in a standalone way to split the data into individual packets with the correct boundaries. This could probably be done in push_data, then every parsed buffer would be queued instead of the input data itself.

oh, you mean we take the parser element out and manually drive it ourselves, instead of letting gstreamer do it? i can look into if this is doable. but it feels like we are using gstreamer in a way that it doesn't want to be used in. i heard we might be moving past gstreamer eventually?

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94280

Elizabeth Figura (＠zfigura)

9:32 p.m.

Eh, sorry, my question was based on a misreading—these aren't coming from our demuxer.

...

I'm not sure we can tell what native does exactly here but IMO, and especially as we're doing this only if we have a parser, this should instead use the parser in a standalone way to split the data into individual packets with the correct boundaries. This could probably be done in push_data, then every parsed buffer would be queued instead of the input data itself.

Why is that better?

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94294

Yuxuan Shui (＠yshui)

12 Feb 12 Feb

12:47 a.m.

On Wed Feb 12 00:47:29 2025 +0000, Elizabeth Figura wrote:

...

Eh, sorry, my question was based on a misreading—these aren't coming from our demuxer.

...
I'm not sure we can tell what native does exactly here but IMO, and

especially as we're doing this only if we have a parser, this should instead use the parser in a standalone way to split the data into individual packets with the correct boundaries. This could probably be done in push_data, then every parsed buffer would be queued instead of the input data itself. Why is that better?

i think by doing that, we don't need to break the buffer into an arbitrary size? e.g. 4kbytes as chosen by this MR.

but i guess then there would be the problem that do we know the parser would always break the buffer up reasonably? (e.g. one chunk per frame). what if the parser also spit out enormous buffers.

i feel the only "clean" solution would be running the decoder in a separate thread, which would allow decoding to happen asynchronously.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94315

Elizabeth Figura (＠zfigura)

12:55 a.m.

...

but i guess then there would be the problem that do we know the parser would always break the buffer up reasonably? (e.g. one chunk per frame). what if the parser also spit out enormous buffers.

In practice parsers should output individual framed buffers; that's part of the point of the parser.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94317

Yuxuan Shui (＠yshui)

1:35 p.m.

On Wed Feb 12 00:55:33 2025 +0000, Elizabeth Figura wrote:

...

...
but i guess then there would be the problem that do we know the parser

would always break the buffer up reasonably? (e.g. one chunk per frame). what if the parser also spit out enormous buffers. In practice parsers should output individual framed buffers; that's part of the point of the parser.

so i have no idea how h264 works so i had to do some research. but looks like h264parse splits thing into what are called NALUs, not frames. in this case, NALUs are actually sub-frame (at most a frame), so we are good for this particular encoder.

but in general it feels dangerous to assume parser will always break buffer into frames or sub-frames. but maybe that's just me.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94361

Rémi Bernon (＠rbernon)

1:40 p.m.

On Wed Feb 12 13:35:29 2025 +0000, Yuxuan Shui wrote:

...

so i have no idea how h264 works so i had to do some research. but looks like h264parse splits thing into what are called NALUs, not frames. in this case, NALUs are actually sub-frame (at most a frame), so we are good for this particular encoder. but in general it feels dangerous to assume parser will always break buffer into frames or sub-frames. but maybe that's just me.

Regardless of how the parser decides to emit output buffers, it'll always be in best case smaller than this arbitrary split and in worst case give identical results.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94362

Yuxuan Shui (＠yshui)

2:06 p.m.

On Wed Feb 12 13:40:43 2025 +0000, Rémi Bernon wrote:

...

Regardless of how the parser decides to emit output buffers, it'll always be in best case smaller than this arbitrary split and in worst case give identical results.

No necessarily - the parser can accumulate multiple buffers and only split at the boundaries it finds suitable.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94364

Rémi Bernon (＠rbernon)

2:16 p.m.

On Wed Feb 12 14:06:37 2025 +0000, Yuxuan Shui wrote:

...

No necessarily - the parser can accumulate multiple buffers and only split at the boundaries it finds suitable.

Which it would also do with this arbitrary split. If this works then parsers don't accumulate more than necessary, which is likely frame or sub-frame granularity.

Using the parser standalone for splitting is also going to help zero copy: we can only write to the client buffers one frame at a time. Pushing data that decodes to more than one frame will require these frames to be decoded into unix-side buffers and copies to be done later to the client buffers.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94365

Brendan McGrath (＠redmcg)

9:24 p.m.

On Wed Feb 12 14:57:44 2025 +0000, Rémi Bernon wrote:

...

Which it would also do with this arbitrary split. If this works then parsers don't accumulate more than necessary, which is likely frame or sub-frame granularity. Using the parser standalone for splitting is also going to help zero copy: we can only write to the client buffers one frame at a time. Pushing data that decodes to more than one frame will require these frames to be decoded into unix-side buffers and copies to be done later to the client buffers.

re: how `h264parse` splits; if you run `gst-inspect-1.0 h264parse`, you'll see one of the caps is `alignment`, which has two options: 1. `au`; and 2. `nal`

So it can split the buffers in one of two ways: 1. at the `au` boundary (which stands for 'Access Unit', and represents a frame (or a field for interlaced); or 2. at the 'nal' boundary. These are smaller than access units and represent a logical bundle of data (which can be metadata or image data). An AU may have several NALs, but a NAL will never overlap two AUs.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_94381

Yuxuan Shui (＠yshui)

25 Feb 25 Feb

3:38 p.m.

On Wed Feb 12 21:24:38 2025 +0000, Brendan McGrath wrote:

...

re: how `h264parse` splits; if you run `gst-inspect-1.0 h264parse`, you'll see one of the caps is `alignment`, which has two options:

`au`; and

`nal`

So it can split the buffers in one of two ways:

at the `au` boundary (which stands for 'Access Unit', and represents

a frame (or a field for interlaced); or 2. at the 'nal' boundary. These are smaller than access units and represent a logical bundle of data (which can be metadata or image data). An AU may have several NALs, but a NAL will never overlap two AUs.

so, turns out pushing a large buffer through just the parser is enough to cause stutters. it's way better than having it go through the whole decoding pipeline, but it's still noticeable.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_95898

Yuxuan Shui (＠yshui)

3:48 p.m.

On Tue Feb 25 15:42:19 2025 +0000, Yuxuan Shui wrote:

...

so, turns out pushing a large buffer through just the parser is enough to cause stutters. it's way better than having it go through the whole decoding pipeline, but it's still noticeable. (Edit: i should try again without gst logging.)

yeah the stutter is gone after logging is disabled, but does still makes me worry as it would indicate that the headroom is pretty small

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7288#note_95899

144

Age (days ago)

162

Last active (days ago)

wine-gitlab@winehq.org

14 comments

5 participants

tags (0)

participants (5)

Brendan McGrath (＠redmcg)
Elizabeth Figura (＠zfigura)
Rémi Bernon (＠rbernon)
Yuxuan Shui
Yuxuan Shui (＠yshui)