http://bugs.winehq.org/show_bug.cgi?id=28723
--- Comment #55 from Alexey Loukianov mooroon2@mail.ru 2011-11-29 03:31:29 CST --- (In reply to comment #54)
Here's another version, cleaned up to submission quality. It contains Alexey's work, as well as Jörg's GetPosition() reimplementation and some other miscellaneous fixes.
Alexey: Does this patchset fix RAGE for you?
Yeah, and reducing in_alsa to be 2x period instead of 3x also works flawlessly. Actually I had switched from testing using RAGE into testing using a hackish testcase I had coded in last Sunday. I would attach it with next comment.
This testcase can be safely compiled as winelib app, and also can be compiled on native OS using fresh-enough ming-w64 toolchain (one which have required mmdev headers). What it does is opens up default WASAPI audio render device in shared mode and playbacks 2500hz tone generated on-fly using event-driven buffer fill. It reports some collected stats during initialization and while playing. Requested buffer duration is the same as XA2 requests - 20ms.
At current form it tries to play 5s of generated sine tone four times each time changing a buffer fill-in method slightly. For the first try it mimics closely XA2 behavior: each time event fires it fill in buffer with 441 audio frames data chunks in loop until buffer can't hold another 441 frames. Second try is just the same, but using 220f chunks. Third try is the same as first one, but for each event no more than one 441f chunk is pumped out to the audio engine. Fourth try is the same as third but using 220f chunks.
On Win7 native system first two tries are played back without noticeable underruns, third and second tries produce a lot of underruns. Fourth try is basically what Jörg had been writing about in comments recently.
What should be noted is that Win7 seems to use 16 frames align for allocated buffer size. I.e. when I request 20ms buffer for 44100Hz stream I got 896 samples buffer in Win7 which is ((20ms * nSamplesPerSec)/16 + 1)*16. At first I thought that actual alignment scheme used is 64 byte for buffer but using 32bit IEEE_FLOAT samples format resulted in same 896f buffer and not 888f buffer which should be expected if 64 byte alignment had been used.
With Wine I get 882 samples buffer for same case due to obvious reasons.
As for observed underrun behavior for the testcase proposed by Jörg: I got 100Hz periodic behavior on Win7 which conforms to the Jörg's prediction. Hadn't had a chance to lay my hands on laptop with installed Vista yet so can't tell what is the behavior under Vista. Hope I would find a moment to test it in a next few days.
P.S. Hadn't had a chance to look at proposed GetPosition implementation yet but can tell for sure that the reported values under Wine differ with ones reported under Win7, especially for cases when there are underruns. On Win7 for underrun free cases reported position lags ~28ms behind amount of pumped out data at the moment of event fires. As the event fires at 10ms intervals when about half of the buffer had been played it gives around 18ms latency introduced by sound engine. It is pretty much in line with values reported by Wine with applied patchset from comment #54 - average lag behind ~33ms, which suspiciously low as we should have around 30ms of data (3xPeriod) sitting at ALSA buffer and another 10ms of data held in mmdevdev buffer resulting in around 40ms of lag.
For cases with underruns (mostly for the fourth try on the testcase) devposition reported by Win7 equals to the amount of data pumped out to the audio engine. Thinking of it makes me believe that it is just right the expected result: audio engine had played back all the data it had been fed with and thus devposition should be equal to samples_sent. With Wine I still got devposition lagging behind about 5ms for fouth try and 10ms for third try even for the "hit underrun and stop feeding data" case. Inspecting +mmdevapi logs makes me believe that this is caused by following sequence of events:
1. ALSA hits underrun. 2. Timer callback gets called, alsa_write_data recovers from underrun and pumps out data to the ALSA buffer (no more than 3xMAX[ALSA Period, MMDEV Period]). For the fourth try from testcase it is exactly 220f of data, for the third try is is 441f. 3. Callback proc fires up event. If app would call GetPosition at event handler - it would find ~220f/441f samples of data laying inside ALSA buffer resulting in reported lag about 5ms/10ms.
We should think about how to rewrite proposed GetPosition logic to more closely match native behavior for such cases.