Hi,
mmdevapi's GetPosition returns "the stream position of the sample that is currently playing through the speakers", says MSDN. This is exactly what apps ought to use that want to synchronize audio & video (lip synchronisation).
MSDN's wording about winmm's GetPosition is not exactly the same: "the current playback position of the given waveform-audio output device." However there's no other winmm function that could be used for lip-sync.
Tests in winmm/tests/wave.c tend to show that WHDR_DONE notifications are only received after a position corresponding to the written number of bytes has been reached -- IOW the buffer has been played.
Now look at winmm's PlaySound and mciwave code: /* make it so that 3 buffers per second are needed */ Both then proceed to play ping pong with two 1/3 second buffers. Every time WHDR_DONE is received, one of the two is empty, refilled and resubmitted using waveOutWrite.
Now consider a system such as PulseAudio that wants to buffer 2 seconds of an audio stream internally. That scheme will fail completely.
- If WHDR_DONE is based on GetPosition, notifications will only be sent after 2 seconds or an underrun. Neither PlaySound nor MCIWAVE buffer 2 seconds. They continuously hit an underrun as PA signals that it drained the tiny 40ms ALSA buffer we supplied. Is that PlaySound's fault?
- If WHDR_DONE is instead based on buffer usage, it could be sent following snd_pcm_write and let PA buffer as much as it wants.
But then our wave tests fail if GetPosition returns something like the position currently playing through the speakers, which is 2s late. Should we have GetPosition lie? And suffer loss of lip sync?
DSound's GetCurrentPosition suffers from the exact same issue. For instance, playing an audio CD-ROM with mcicda in recent Wine causes constant underruns from DSound when Pulseaudio is involved.
What's the root cause of the issue?
15 years ago, the audio HW would likely receive a pointer to the winmm header data and play that. Once played, the buffer was no more needed and returned to the app. Also, there was nothing sitting behind the audio HW causing significant latency (I mean DAC -> electronic -> speaker -> acoustic had latency negligible to the human brain). Equating GetPosition with the sample seen by the DAC was and is reasonable -- ALSA does exactly this with hw:0.
DSound's model is that of a circular buffer from which the DAC is fed. No different.
These days with mmdevapi, w7 users report a latency of 30-40ms introduced by the native mixer. Native's winmm might account for that in its GetPosition reports. This limit is below the threshold that my mmdevapi tests would notice. It is well below the approximately 100ms that matter to human perception when correlating visible and audible events.
Note that native's mixer is the last in an audio graph before the HW.
What happens in Linux with Wine?
ALSA's dmix is known to introduce some latency as it operates in period-size chunks. However ALSA's periods are typically very small, e.g. 21.333ms and ALSA's dmix is typically connected to hw:0, immediately playing audio to the speakers. There's hardly a difference between audible GetPosition and a buffer position derived from snd_pcm_write (and old Wine mixed the two in the past). The picture looks a lot like with native's mmdevapi mixer.
PulseAudio is entirely different. With PA, the wine mixer is no more near the end of the audio graph, rather than in front of a 2 second latency introducing element. A position derived from snd_pcm_write is 2s ahead of the true speaker position.
How did we react? winealsa tried:
1. Using a small ALSA buffer so as to signal PA that large latencies are not ok. PA seems to ignore that hint. It still appears to buffer a lot somewhere internally (I've no experience with PA 1.0 or 1.1, please check).
Even worse, small ALSA buffers like 40ms increased the risk of underruns and resulted in a worse audio experience to all users of wineALSA, whereas old wine would typically use a 100ms buffer.
Small buffers are no issue to native's mixer. It's using "Pro Audio" priority, which is the highest priority on a native system AFAIK. By contrast, Wine is running at normal user priority.
2. Rate-Limiting. We currently believe we are doing it because we write no more than 3 periods at a time. But we're not. Nothing can prevent PA or any other device from filling its 2s buffer over time. It'll eventually do it, just slowly.
We can't use our own clock to limit our writes because that would introduce clock skew issues with the audio HW clock. What if it runs faster than the system's interpretation of 48000fps? We can't second-guess the audio HW clock. IOW, we can't win a rate-limiting game. If the back-end signals via snd_pcm_avail that it has space, then we must feed it.
Now we know the reason of our troubles. What can we do?
0. Ignore the issue. Equate GetPosition with Released_frames - GetCurrentPadding
1. Consider lip-sync important and strive to support audio & video applications.
2. mmdevapi appears to have a reasonable separation of concerns between buffering and speaker position. Leave as is. (It may happen though that apps built upon the assumption that they are no more than 40ms apart break when latencies like PA's enter the loop. Here it would be interesting to see whether native has some similar situation, like a remote desktop & audio environment).
3. Protect DSound from a too large delta between padding and position. The maximum is given by the DSound primary buffer size, since DSound's circular GetCurrentPosition abstraction breaks down completely should they be further apart.
IOW, don't rate limit DSound writes, but lie about the position if the wineXYZ device says it's too far behind.
Having this right may fix some of the bugs currently in bugzilla.
That way, we'll get lip-sync with back-ends that don't introduce an unbearable latency. And we won't enter an underrun because the driver or the app never wait too much for the position to increase.
3. Protect winmm from a too large delta between padding and position. As there's no buffer limit to guide us, we must introduce an arbitrary one, around 100-200ms. I don't know whether a dynamic limit based on average supplied WHDR buffers would work.
Regarding WHDR_DONE, I don't know whether we should then relax the tests. That feels unsafe, because an app may well wait for WHDR_DONE before calling waveOutReset or Close. WHDR_DONE should not come too early, or we may lose trailing sounds.
Perhaps we should not delay the position when playing the last buffer in the list. Our winmm/tests/wave.c tests prove nothing more than that: at the end, the position corresponds to the sum of written samples. Perhaps a similar reflection on MS' side explains why in my tests native's mmdevapi GetPosition may stay 17 samples below the sum of written samples with some USB headsets (or is it just a bug in their driver?).
4. Observe the behaviour of libraries that target mmdevapi, winmm and DSound, e.g. OpenAL, XAudio2, bink, smack, SDL, FMOD and learn.
Actually, thinking about what may happen to trailing samples in a system that introduces 2s latencies still gives headaches. If we issue snd_pcm_reset prior to snd_pcm_close, sound will be killed for sure. It looks like the device should remain open for some time.
This applies to all wineXYZ devices, not just ALSA with PA. Some future OSS device may too introduce unforeseen latencies.
Now I'll go and fix winmm as suggested above.
Thank you for reading this far, Jörg Höhle