On Friday, June 15, 2012 3:39:33 PM Andrew Eikum wrote:
One thing to note is that PulseAudio has an absurdly high default latency (two seconds), and due to its poor API there's no (easy) way for us to control it.
The latency is likely caused by PulseAudio setting a much larger tlength (and minreq) than what you asked for. This happens all the time for me when the server has been running.
The trick I've found to dealing with this in OpenAL Soft is to essentially ignore what PulseAudio sets and do your free space calculations so that only the size of the buffer you requested is used. Something like:
size_t writable_size(pulse_data *data) { uint buffer_size = our requested buffer size in bytes; uint update_size = our expected period size in bytes; /* This is signed since it could come out negative */ ssize_t len;
pa_threaded_mainloop_lock(data->loop); len = pa_stream_writable_size(data->stream) - data->attr.tlength + buffer_size; pa_threaded_mainloop_unlock(data->loop);
if(len < 0) return 0; len -= len%update_size; return len; }
The update_size is only relevant if mmdevapi updates in period-sized chunks. If it doesn't, you can just remove it and the rounding. The tlength should be kept updated using pa_stream_set_buffer_attr_callback, in case the server tries to increase it more during runtime.
When creating the stream, I set minreq to the period size, and tlength to the requested buffer size. I also set the flags PA_STREAM_INTERPOLATE_TIMING, PA_STREAM_AUTO_TIMING_UPDATE, and PA_STREAM_ADJUST_LATENCY.
The most important part is to not rely on PA_STREAM_EARLY_REQUESTS or the write callback to tell you when some space is free for writing, since as noted above, PulseAudio can set unreasonably large values.
This doesn't completely fix the latency problem, as I'll still get better performance using ALSA's dmix, but it helps immensely.
On Fri, Jun 15, 2012 at 06:23:57PM -0700, Chris Robinson wrote:
The update_size is only relevant if mmdevapi updates in period-sized chunks. If it doesn't, you can just remove it and the rounding. The tlength should be kept updated using pa_stream_set_buffer_attr_callback, in case the server tries to increase it more during runtime.
We chatted a little on IRC this weekend, but thanks again for the advice.
Mmdevapi sends it data to the lower systems in period-sized chunks, and the reported buffer fill level decreases in period-sized chunks. But the audio clock position is more accurate than that, returning some value between periods, and taking into account latency.
Unfortunately, PulseAudio (and, apparently, every Linux audio API) refuses to make any guarantees at all with regard to things like buffer sizes, period sizes, callback regularity, and latencies. So in Wine we have to build an emulation layer, involving a buffer in the driver itself, and lie about buffer drain behavior and latencies. I really don't understand why the audio APIs refuse to do this heavy lifting, instead pushing it off to applications. What is the API for if I have to do all the hard work anyway?
It'd be handy if there was some page explaining how PulseAudio's data model works. Stuff like where the data is stored, when and how it's transfered, when callbacks are triggered, what the members of pa_buffer_attr actually mean (their documentation is useless), how the stream flags affect stream operation and the callbacks, when the buffer attributes might change and how applications should deal with those changes. This stuff is all unclear to me, and I think it's a big source of my frustration with the API.
When creating the stream, I set minreq to the period size, and tlength to the requested buffer size. I also set the flags PA_STREAM_INTERPOLATE_TIMING, PA_STREAM_AUTO_TIMING_UPDATE, and PA_STREAM_ADJUST_LATENCY.
The most important part is to not rely on PA_STREAM_EARLY_REQUESTS or the write callback to tell you when some space is free for writing, since as noted above, PulseAudio can set unreasonably large values.
Yeah, I experimented with ADJUST_LATENCY, as it seems to be the trick to getting lower latencies. Then we have to maintain our own buffer, which is why I switched to using the write callback, to have Pulse tell us when it's ready for more data from our internal buffer. But then I couldn't get the callbacks to actually trigger without supplying EARLY_REQUESTS, which invalidates our latency request, causing the high latency problem... wonderful API, isn't it?
Your mail makes me think I should go back to the pulse-independent timer setup. That is, write to Pulse during the CreateTimerQueueTimer() callback. I have a strong feeling I've been down that path before, but I could give it another shot.
Thanks again, Andrew
On Monday, June 18, 2012 9:31:04 AM Andrew Eikum wrote:
We chatted a little on IRC this weekend, but thanks again for the advice.
No problem. :)
Mmdevapi sends it data to the lower systems in period-sized chunks, and the reported buffer fill level decreases in period-sized chunks. But the audio clock position is more accurate than that, returning some value between periods, and taking into account latency.
So the pulse driver would likely need a period-sized intermediary buffer, to store the unwritten samples that can't fill a full period (then keep it stored until enough writes are made to fill it and it's written to the stream).
The PA_STREAM_INTERPOLATE_TIMING and PA_STREAM_AUTO_TIMING_UPDATE flags should provide timing with a good bit of granularity, along with regularly being resync'd to the server's clock.
Unfortunately, PulseAudio (and, apparently, every Linux audio API) refuses to make any guarantees at all with regard to things like buffer sizes, period sizes, callback regularity, and latencies.
Right, which is why you can't rely on what PulseAudio sets for the buffer/tlength size, and you have to manage the expected size yourself, but that's easy. Luckilly PulseAudio supports buffer sizes up to about 2 seconds, I believe, which is good enough for mmdevapi. Unlike ALSA and others, Pulse has no problems handling buffers as large as what mmdevapi supports.
PulseAudio doesn't really do periods. It actually disables audio hardware interrupts when it can and uses high-resolution timers to work out where the audio pointers are at any given time. The period values can pretty much just be emulated and neither pulseaudio or the app should care that much.
Latency issues are what my approach tries to improve. Although ALSA's dmix still does perform better for playback, PulseAudio is at least now comparible and actually usable when it comes to responsive/interactive audio.
For the callback regularity issues, keep in mind that you'll never get 100% guarantees with protected mode code on a multi-tasking OS, so as long as apps work as expected, it's "good enough". However, you can improve over Pulse's callbacks by using a background thread that keeps an eye on the amount of data pulseaudio has chewed through, and trigger any callbacks or events yourself as needed.
It'd be handy if there was some page explaining how PulseAudio's data model works. Stuff like where the data is stored, when and how it's transfered, when callbacks are triggered, what the members of pa_buffer_attr actually mean (their documentation is useless), how the stream flags affect stream operation and the callbacks, when the buffer attributes might change and how applications should deal with those changes. This stuff is all unclear to me, and I think it's a big source of my frustration with the API.
The sample data for playback streams is stored in the server, AFAIK. It's transfered when you call pa_stream_write, and it's either done synchronously or asynchronously depending on who "owns" the buffer it's given. With mmdevapi's design, there's no real problem to let pulse maintain ownership and do the writes asynchronously.
Callbacks are triggered when the client receives the appropriate signals from the server. When exactly the server sends the signal depends on what the signal is and the various settings and attributes.
PulseAudio's documentation isn't all that bad, considering. If you want to talk about bad API docs, look at ALSA.
Yeah, I experimented with ADJUST_LATENCY, as it seems to be the trick to getting lower latencies. Then we have to maintain our own buffer, which is why I switched to using the write callback, to have Pulse tell us when it's ready for more data from our internal buffer.
You shouldn't need a shadow buffer with ADJUST_LATENCY. Just write it to pulse as the app writes it to mmdevapi. Why delay? It can handle the buffer sizes.
This also avoids the timing problems with the write callback. The most you'll need is a temporary storage buffer for when an app writes an incomplete period.
On Mon, Jun 18, 2012 at 08:49:55AM -0700, Chris Robinson wrote:
On Monday, June 18, 2012 9:31:04 AM Andrew Eikum wrote:
Yeah, I experimented with ADJUST_LATENCY, as it seems to be the trick to getting lower latencies. Then we have to maintain our own buffer, which is why I switched to using the write callback, to have Pulse tell us when it's ready for more data from our internal buffer.
You shouldn't need a shadow buffer with ADJUST_LATENCY. Just write it to pulse as the app writes it to mmdevapi. Why delay? It can handle the buffer sizes.
Are you sure? You said above:
Right, which is why you can't rely on what PulseAudio sets for the buffer/tlength size...
Is there any guarantee that Pulse /will/ give us a sufficient buffer?
From the API docs[1], it just "tries to assure" that we have the
requested buffer size, which actually means nothing.
And even worse, Pulse's API doesn't let us set both the buffer size and the latency[2]. Since we need to set the latency, we can't request a buffer size at all. So we have to rely on Pulse's default buffer size, and I don't think we can depend on that being sufficient in every single case and configuration. So we need the local buffer.
This also avoids the timing problems with the write callback. The most you'll need is a temporary storage buffer for when an app writes an incomplete period.
Yeah, the new design would only use the local buffer if not all of the data fed to mmdevapi will fit into the Pulse buffer. I remember trying this method and it failing for some reason, but maybe it'll work this time around.
[1] At least, I think that's what the docs say. I still have no idea what these structure members actually do. http://freedesktop.org/software/pulseaudio/doxygen/structpa__buffer__attr.html
[2] http://www.freedesktop.org/wiki/Software/PulseAudio/Documentation/Developer/Clients/LactencyControl
On Monday, June 18, 2012 11:18:45 AM Andrew Eikum wrote:
Is there any guarantee that Pulse /will/ give us a sufficient buffer? From the API docs[1], it just "tries to assure" that we have the requested buffer size, which actually means nothing.
It "tries to assure that at least tlength bytes are always available in the per-stream server-side playback buffer". You get the buffer size you ask for with tlength, and the server does what it can to make sure that's how many bytes are waiting in the buffer (by sending write requests at the appropriate times to keep it filled, specifying the unused portion of the buffer as writable, etc).
It doesn't mean the buffer itself can be smaller than what you asked for, just that the number of bytes waiting in it may be. The buffer size can increase up to maxlength, and that can be something like 2 or more seconds.
And even worse, Pulse's API doesn't let us set both the buffer size and the latency[2].
The buffer size can grow as-needed, up to maxlength. The initial size won't restrict you in any way. If you set maxlength to -1, Pulse will set it to the max size supported by the server.
There should be no issue with PulseAudio supporting buffer sizes as large as mmdevapi would want.