According to both MSDN and our impementation, GetBufferSize returns the size of the buffer, but it doesn't guarantee that all of it is available. In order to know how much of it is available, the caller must also call GetCurrentPadding and subtract that number to the buffer size. Failing to do so might result in GetBuffer returning an error.
Signed-off-by: Giovanni Mascellani gmascellani@codeweavers.com --- This patch is unchanged. It is being resent because otherwise the second patch fails to apply.
dlls/mf/sar.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/dlls/mf/sar.c b/dlls/mf/sar.c index 77cbfcb21da..eba822ae0fe 100644 --- a/dlls/mf/sar.c +++ b/dlls/mf/sar.c @@ -1751,7 +1751,7 @@ static HRESULT WINAPI audio_renderer_render_callback_GetParameters(IMFAsyncCallb
static void audio_renderer_render(struct audio_renderer *renderer, IMFAsyncResult *result) { - unsigned int src_frames, dst_frames, max_frames, src_len; + unsigned int src_frames, dst_frames, max_frames, pad_frames, src_len; struct queued_object *obj, *obj2; BOOL keep_sample = FALSE; IMFMediaBuffer *buffer; @@ -1775,20 +1775,24 @@ static void audio_renderer_render(struct audio_renderer *renderer, IMFAsyncResul { if (SUCCEEDED(IAudioClient_GetBufferSize(renderer->audio_client, &max_frames))) { - src_frames -= obj->u.sample.frame_offset; - dst_frames = min(src_frames, max_frames); - - if (SUCCEEDED(hr = IAudioRenderClient_GetBuffer(renderer->audio_render_client, dst_frames, &dst))) + if (SUCCEEDED(IAudioClient_GetCurrentPadding(renderer->audio_client, &pad_frames))) { - memcpy(dst, src + obj->u.sample.frame_offset * renderer->frame_size, - dst_frames * renderer->frame_size); + max_frames -= pad_frames; + src_frames -= obj->u.sample.frame_offset; + dst_frames = min(src_frames, max_frames);
- IAudioRenderClient_ReleaseBuffer(renderer->audio_render_client, dst_frames, 0); + if (SUCCEEDED(hr = IAudioRenderClient_GetBuffer(renderer->audio_render_client, dst_frames, &dst))) + { + memcpy(dst, src + obj->u.sample.frame_offset * renderer->frame_size, + dst_frames * renderer->frame_size);
- obj->u.sample.frame_offset += dst_frames; - } + IAudioRenderClient_ReleaseBuffer(renderer->audio_render_client, dst_frames, 0);
- keep_sample = FAILED(hr) || src_frames > max_frames; + obj->u.sample.frame_offset += dst_frames; + } + + keep_sample = FAILED(hr) || src_frames > max_frames; + } } } IMFMediaBuffer_Unlock(buffer);
The IAudioClient implementation from both Windows and winepulse.drv never sets the event more than once per period, which is usually around 10 ms long. Some codecs produce audio samples shorter than 10 ms, so it is critical that the SAR is able to process more than a sample per period.
This is not currently the case: a new sample is requested only in audio_renderer_render, which is executed (at most) once per period. This results in the SAR not being able to keep up with the audio client, and eventually underrunning.
With this patch the SAR keeps a count of how many frames are currently queued, and a new sample is immediately requested if the internal queue has less than two periods of frames.
This patch fixes audio stuttering problems in the logo videos of Borderlands 3, Deep Rock Galactic and Mutant Year Zero.
Signed-off-by: Giovanni Mascellani gmascellani@codeweavers.com --- v2: Remove some changes that didn't really help the solution and update the description accordingly. v3: Reimplement from scratches, using a different strategy and avoiding manually setting the event. v4: Resent with no changes, because it fails to apply without the first patch.
This patch supersedes 208119.
dlls/mf/sar.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/dlls/mf/sar.c b/dlls/mf/sar.c index eba822ae0fe..e39dc09146a 100644 --- a/dlls/mf/sar.c +++ b/dlls/mf/sar.c @@ -101,6 +101,8 @@ struct audio_renderer HANDLE buffer_ready_event; MFWORKITEM_KEY buffer_ready_key; unsigned int frame_size; + unsigned int queued_frames; + unsigned int target_queued_frames; struct list queue; enum stream_state state; unsigned int flags; @@ -1330,6 +1332,13 @@ static HRESULT WINAPI audio_renderer_stream_GetMediaTypeHandler(IMFStreamSink *i static HRESULT stream_queue_sample(struct audio_renderer *renderer, IMFSample *sample) { struct queued_object *object; + DWORD sample_len, sample_frames; + HRESULT hr; + + if (FAILED(hr = IMFSample_GetTotalLength(sample, &sample_len))) + return hr; + + sample_frames = sample_len / renderer->frame_size;
if (!(object = calloc(1, sizeof(*object)))) return E_OUTOFMEMORY; @@ -1339,6 +1348,7 @@ static HRESULT stream_queue_sample(struct audio_renderer *renderer, IMFSample *s IMFSample_AddRef(object->u.sample.sample);
list_add_tail(&renderer->queue, &object->entry); + renderer->queued_frames += sample_frames;
return S_OK; } @@ -1359,7 +1369,10 @@ static HRESULT WINAPI audio_renderer_stream_ProcessSample(IMFStreamSink *iface, EnterCriticalSection(&renderer->cs); if (renderer->state == STREAM_STATE_RUNNING) hr = stream_queue_sample(renderer, sample); - renderer->flags &= ~SAR_SAMPLE_REQUESTED; + if (renderer->queued_frames < renderer->target_queued_frames) + IMFMediaEventQueue_QueueEventParamVar(renderer->stream_event_queue, MEStreamSinkRequestSample, &GUID_NULL, S_OK, NULL); + else + renderer->flags &= ~SAR_SAMPLE_REQUESTED; LeaveCriticalSection(&renderer->cs);
return hr; @@ -1516,6 +1529,8 @@ static HRESULT audio_renderer_create_audio_client(struct audio_renderer *rendere unsigned int flags; WAVEFORMATEX *wfx; HRESULT hr; + REFERENCE_TIME period; + DWORD samples_per_second;
audio_renderer_release_audio_client(renderer);
@@ -1543,6 +1558,7 @@ static HRESULT audio_renderer_create_audio_client(struct audio_renderer *rendere flags |= AUDCLNT_STREAMFLAGS_NOPERSIST; hr = IAudioClient_Initialize(renderer->audio_client, AUDCLNT_SHAREMODE_SHARED, flags, 1000000, 0, wfx, &renderer->stream_config.session_id); + samples_per_second = wfx->nSamplesPerSec; CoTaskMemFree(wfx); if (FAILED(hr)) { @@ -1576,6 +1592,13 @@ static HRESULT audio_renderer_create_audio_client(struct audio_renderer *rendere return hr; }
+ if (FAILED(hr = IAudioClient_GetDevicePeriod(renderer->audio_client, &period, NULL))) + { + WARN("Failed to retrieve device period, hr %#x.\n", hr); + return hr; + } + renderer->target_queued_frames = 2 * period * samples_per_second / 10000000; + if (SUCCEEDED(hr = MFCreateAsyncResult(NULL, &renderer->render_callback, NULL, &result))) { if (FAILED(hr = MFPutWaitingWorkItem(renderer->buffer_ready_event, 0, result, &renderer->buffer_ready_key))) @@ -1789,6 +1812,7 @@ static void audio_renderer_render(struct audio_renderer *renderer, IMFAsyncResul IAudioRenderClient_ReleaseBuffer(renderer->audio_render_client, dst_frames, 0);
obj->u.sample.frame_offset += dst_frames; + renderer->queued_frames -= dst_frames; }
keep_sample = FAILED(hr) || src_frames > max_frames;
On 6/22/21 5:34 PM, Giovanni Mascellani wrote:
@@ -1359,7 +1369,10 @@ static HRESULT WINAPI audio_renderer_stream_ProcessSample(IMFStreamSink *iface, EnterCriticalSection(&renderer->cs); if (renderer->state == STREAM_STATE_RUNNING) hr = stream_queue_sample(renderer, sample);
- renderer->flags &= ~SAR_SAMPLE_REQUESTED;
- if (renderer->queued_frames < renderer->target_queued_frames)
IMFMediaEventQueue_QueueEventParamVar(renderer->stream_event_queue, MEStreamSinkRequestSample, &GUID_NULL, S_OK, NULL);
- else
LeaveCriticalSection(&renderer->cs);renderer->flags &= ~SAR_SAMPLE_REQUESTED;
Let's request only when running.
- if (FAILED(hr = IAudioClient_GetDevicePeriod(renderer->audio_client, &period, NULL)))
- {
WARN("Failed to retrieve device period, hr %#x.\n", hr);
return hr;
- }
- renderer->target_queued_frames = 2 * period * samples_per_second / 10000000;
Could this be replaced with GetBufferSize() that returns size in frames?
@@ -1789,6 +1812,7 @@ static void audio_renderer_render(struct audio_renderer *renderer, IMFAsyncResul IAudioRenderClient_ReleaseBuffer(renderer->audio_render_client, dst_frames, 0);
obj->u.sample.frame_offset += dst_frames;
renderer->queued_frames -= dst_frames;
queued_frames should be reset when every time we empty this list.
Have you tested patch v4 with the game?
Hi Nikolay,
Il 23/06/21 23:04, Nikolay Sivov ha scritto:
On 6/22/21 5:34 PM, Giovanni Mascellani wrote:
@@ -1359,7 +1369,10 @@ static HRESULT WINAPI audio_renderer_stream_ProcessSample(IMFStreamSink *iface, EnterCriticalSection(&renderer->cs); if (renderer->state == STREAM_STATE_RUNNING) hr = stream_queue_sample(renderer, sample);
- renderer->flags &= ~SAR_SAMPLE_REQUESTED;
- if (renderer->queued_frames < renderer->target_queued_frames)
IMFMediaEventQueue_QueueEventParamVar(renderer->stream_event_queue, MEStreamSinkRequestSample, &GUID_NULL, S_OK, NULL);
- else
renderer->flags &= ~SAR_SAMPLE_REQUESTED; LeaveCriticalSection(&renderer->cs);
Let's request only when running.
Right, done in v5.
It is not completely clear to me what should happen when the stream finishes, but whatever happens now I don't think my patch does worse.
- if (FAILED(hr = IAudioClient_GetDevicePeriod(renderer->audio_client, &period, NULL)))
- {
WARN("Failed to retrieve device period, hr %#x.\n", hr);
return hr;
- }
- renderer->target_queued_frames = 2 * period * samples_per_second / 10000000;
Could this be replaced with GetBufferSize() that returns size in frames?
No, buffer and period are two different things.
The buffer length is the total amount of memory that the audio client reserves for receiving data from the caller. The period is how often the audio client processes the incoming data. Usually the buffer is about three periods long.
In my understanding the audio client runs something like this loop:
while (1) { fetch_a_period_of_data_from_the_buffer(); send_data_to_audio_card(); SetEvent(event); Sleep(period); }
So, it is important that every time the event is set, at least a period is written to the buffer. Maybe even more, if there is space, to be sure we are not going to miss the next round. This is what my patch does: if the queue has less than a period (actually two, again to be sure) of data, it loads more without waiting for the event.
In a sense, at every round we have to process at least a period of data and at most a buffer of data (because it is impossible to write more than a buffer; actually, more than buffer - padding, as my already-accepted patch fixed).
@@ -1789,6 +1812,7 @@ static void audio_renderer_render(struct audio_renderer *renderer, IMFAsyncResul IAudioRenderClient_ReleaseBuffer(renderer->audio_render_client, dst_frames, 0);
obj->u.sample.frame_offset += dst_frames;
renderer->queued_frames -= dst_frames;
queued_frames should be reset when every time we empty this list.
Right, done in v5.
Have you tested patch v4 with the game?
Yes, and also v5. Stuttering is fixed in all the three games mentioned in the description.
Thanks, Giovanni.
On 6/24/21 10:42 AM, Giovanni Mascellani wrote:
+ renderer->target_queued_frames = 2 * period * samples_per_second / 10000000;
Could this be replaced with GetBufferSize() that returns size in frames?
No, buffer and period are two different things.
The buffer length is the total amount of memory that the audio client reserves for receiving data from the caller. The period is how often the audio client processes the incoming data. Usually the buffer is about three periods long.
So why can't we keep buffering until we have a total buffer length of frames? How much is "target_queued_frames", is that 2 periods? If it will accept 3 periods at maximum, is it worse to buffer 3 instead of 2?
In my understanding the audio client runs something like this loop:
while (1) { fetch_a_period_of_data_from_the_buffer(); send_data_to_audio_card(); SetEvent(event); Sleep(period); }
So, it is important that every time the event is set, at least a period is written to the buffer. Maybe even more, if there is space, to be sure we are not going to miss the next round. This is what my patch does: if the queue has less than a period (actually two, again to be sure) of data, it loads more without waiting for the event.
In a sense, at every round we have to process at least a period of data and at most a buffer of data (because it is impossible to write more than a buffer; actually, more than buffer - padding, as my already-accepted patch fixed).
Hi,
Il 24/06/21 09:52, Nikolay Sivov ha scritto:
So why can't we keep buffering until we have a total buffer length of frames? How much is "target_queued_frames", is that 2 periods? If it will accept 3 periods at maximum, is it worse to buffer 3 instead of 2?
We have two places here where audio data is temporarily stored, so we need two policies, one for each of them: * the IAudioClient buffer; * the SAR queue.
The IAudioClient buffer has a maximum size, which is the thing I simply called "buffer size" in my previous email, and if the source if fast enough (which usually is) it eventually gets filled anyway. This buffer size is usually about 3 periods, I think, but that's not guaranteed.
The SAR queue is a queue, not a ring buffer, so it doesn't have any maximum size. In line of principle you can queue up as many frames as you wish, but that wouldn't be very sensible. The only requirement that you have is to queue at least a period, because the IAudioClient buffer will request more data in batches of a period, and if we have at least a period we can keep it filled easily. So for the SAR queue I am aiming at two periods, so we have some margin.
This means that in total we're storing more or less five periods of audio data, two in the queue and around three in the buffer. But these two and three are in two different places, so you can't say to replace one with the other.
Unless of course we decide to ditch the SAR queue altogether and copy frame data directly in the IAudioClient queue in ProcessSample. But I guess the queue was there for a reason, so I don't know if this is viable.
Giovanni.
On 6/24/21 11:30 AM, Giovanni Mascellani wrote:
Hi,
Il 24/06/21 09:52, Nikolay Sivov ha scritto:
So why can't we keep buffering until we have a total buffer length of frames? How much is "target_queued_frames", is that 2 periods? If it will accept 3 periods at maximum, is it worse to buffer 3 instead of 2?
We have two places here where audio data is temporarily stored, so we need two policies, one for each of them: * the IAudioClient buffer; * the SAR queue.
The IAudioClient buffer has a maximum size, which is the thing I simply called "buffer size" in my previous email, and if the source if fast enough (which usually is) it eventually gets filled anyway. This buffer size is usually about 3 periods, I think, but that's not guaranteed.
The SAR queue is a queue, not a ring buffer, so it doesn't have any maximum size. In line of principle you can queue up as many frames as you wish, but that wouldn't be very sensible. The only requirement that you have is to queue at least a period, because the IAudioClient buffer will request more data in batches of a period, and if we have at least a period we can keep it filled easily. So for the SAR queue I am aiming at two periods, so we have some margin.
This means that in total we're storing more or less five periods of audio data, two in the queue and around three in the buffer. But these two and three are in two different places, so you can't say to replace one with the other.
Unless of course we decide to ditch the SAR queue altogether and copy frame data directly in the IAudioClient queue in ProcessSample. But I guess the queue was there for a reason, so I don't know if this is viable.
What I'm asking is, why do we need this additional estimation with period length, with an assumption of a number of periods per buffer, when we could simply aim at maximum capacity.
Giovanni.
Hi,
Il 24/06/21 10:46, Nikolay Sivov ha scritto:
What I'm asking is, why do we need this additional estimation with period length, with an assumption of a number of periods per buffer, when we could simply aim at maximum capacity.
For the IAudioClient buffer we're already aiming at maximum capacity: in audio_renderer_render() as much data as possible is copied from the queue to the buffer, which eventually gets filled, provided that the queue itself does not starve. This is already properly implemented and my patch is not touching that.
What my patch is touching is getting sure that the queue doesn't starve, even if the source sends very short samples. The queue has no set maximum, so it doesn't make sense to say that we aim for maximum capacity. There is a minimum we have to respect, which is a period length, because audio_renderer_render() is called once per period and it consumes a period of data from the queue. I am aiming at two just because two is larger than one and I cannot see why buffering even more would give any advantage.
This is the reason why my patch computes the period length. From the point of view of the queue there are no other relevant quantities around. The IAudioClient buffer size has in line of principle no meaning for the SAR queue, and the fact that the buffer is more or less 3 * period is a vague convention that may not even be always true (the SAR could request a different buffer size, and the IAudioClient is not forced to honor that request), and that I don't think we should give for granted. We basically should just ignore that fact, at least for establishing the policy for the queue.
Thanks, Giovanni.
On 6/24/21 12:01 PM, Giovanni Mascellani wrote:
Hi,
Il 24/06/21 10:46, Nikolay Sivov ha scritto:
What I'm asking is, why do we need this additional estimation with period length, with an assumption of a number of periods per buffer, when we could simply aim at maximum capacity.
For the IAudioClient buffer we're already aiming at maximum capacity: in audio_renderer_render() as much data as possible is copied from the queue to the buffer, which eventually gets filled, provided that the queue itself does not starve. This is already properly implemented and my patch is not touching that.
What my patch is touching is getting sure that the queue doesn't starve, even if the source sends very short samples. The queue has no set maximum, so it doesn't make sense to say that we aim for maximum capacity. There is a minimum we have to respect, which is a period length, because audio_renderer_render() is called once per period and it consumes a period of data from the queue. I am aiming at two just because two is larger than one and I cannot see why buffering even more would give any advantage.
This is the reason why my patch computes the period length. From the point of view of the queue there are no other relevant quantities around. The IAudioClient buffer size has in line of principle no meaning for the SAR queue, and the fact that the buffer is more or less 3 * period is a vague convention that may not even be always true (the SAR could request a different buffer size, and the IAudioClient is not forced to honor that request), and that I don't think we should give for granted. We basically should just ignore that fact, at least for establishing the policy for the queue.
Does attached patch work? All we need is to queue some frames, not too , if buffer is always at least a period long, that will be okay, either 3/2 as in this example or just max_frames is fine. I'd like to keep it simple.
Thanks, Giovanni.
Hi,
Il 24/06/21 22:02, Nikolay Sivov ha scritto:
Does attached patch work? All we need is to queue some frames, not too , if buffer is always at least a period long, that will be okay, either 3/2 as in this example or just max_frames is fine. I'd like to keep it simple.
Yes, it works.
Personally I don't see how it is simpler than mine (the conversion between time and number of frames is completely straightforward and standard in this kind of processing), and I would find it confusing that the buffer size is used for something that has nothing to do with the buffer.
However, I guess the decision is up to you.
Giovanni.