http://bugs.winehq.org/show_bug.cgi?id=29472
--- Comment #12 from Jörg Höhle hoehle@users.sourceforge.net 2012-01-23 08:07:30 CST --- Update on TRUEPLAYPOSITION. The author of a program named BASS noted: http://www.bass.radio42.com/help/html/3490e2bc-7f3a-9135-3d24-ee519029f737.h... http://www.un4seen.com/forum/?PHPSESSID=k4r85j2pfbitrajp29996jq6f0&topic... "... the use of DirectSound's new DSBCAPS_TRUEPLAYPOSITION option by BASS 2.4.3, [...] improves the precision of position reporting on Vista (it's only within 10ms otherwise). But after some further testing, it also appears to increase latency by around 20ms."
When I read this, I'm tempted to conclude that TRUEPLAYPOSITION uses mmdevapi's GetPosition, reporting speaker positions closer to the DAC, whereas GETCURRENTPOSITION[2] expresses speaker position as seen in DSound's buffer, completely ignoring that mmdevapi now sits beneath it, adding its 30-40ms (not 20ms AFAICT) share to latency.
IOW, old useage of DSound without that Vista-era flag does not report speaker positions in modern systems built with "advanced" audio servers. That's the legacy of its low-level HW ("Direct") approach. It's no more hitting the copper since Vista.
In practice, native's mmdevapi does not introduce too much latency for this to become perceptible: http://www.tesoga.com/articles/crossplat5.html "most people are not good at perceiving sound latency if it is kept down to 100-150 milliseconds."
Therefore, lip-sync based on GetCurrentPosition still appears to work. Furthermore, its 30ms is small enough to fit DSound's buffer such that the play+write pointer pair abstraction can still be made to work.
But Wine's mmdevapi is different. As you can verify by running my mmdevapi render GetPosition tests, winealsa is currently around 80ms with ALSA/dmix. Add DSound's own buffering and you get close to that 100ms limit. Add PulseAudio's typical delay and you get to values where you start wondering why bugzilla is not already flooded with complaints about late explosion noise.
Even worse, Wine useage scenarios comprise running with remote desktops, networked audio, PulseAudio and the like. There, true speaker position can be so far away from data still in DSound's typical 100ms buffer that it'll break the play+write pointer abstraction. Roughly, I'd say that if latency > half that buffer size (50ms), the abstraction will break down because there will be too little room for the app to write data or because it may not wake up often enough to write data at all.
My recommendation: 0. Observe how native behaves with high-latency equipment/environments. A. Have GETCURRENTPOSITION[2] yield DSound's buffer position, ignoring the following mmdevapi and host latency. B. Have TRUEPLAYPOSITION (when it'll be implemented) take mmdevapi's GetPosition into account, but cap it if it's too large for DSound's buffer.
Possible improvements: 0. Consider increasing DSound's primary buffer. A. Add mmdevapi's GetCurrentPadding -- again, if not too large. B. Have DSound bypass mmdevapi??
I'm unsure whether faking positions using a clock may help work-around pre-buffering such as PA does, but also some native equipment. I've received logs where some native sound card too initially consumed a lot of frames, then stabilized. That's why padding is not a good indicator of speaker positions. A distinct clock may cause issues when running for hours.