http://bugs.winehq.org/show_bug.cgi?id=33155
--- Comment #2 from Jörg Höhle hoehle@users.sourceforge.net 2013-03-15 19:10:41 CDT --- A solution to this issue does not look pretty. If somebody has a better idea, please tell:
If srcLengthUsed < packet_frames, then we have to deal with these trailing frames.
Calling the ACM again to try and convert them would be pointless, because presumably they are fewer than the codec's blocksize. For instance, with IMA_ADPCM's 256 blocksize, the typical 44100Hz 10ms 441bytes mmdevapi packet would yield a remainder of 185 bytes...
So those bytes need be remembered (=copied) somewhere, for conversion later, after the next packet is received from mmdevapi. When it is received, its data must be joined with (=copied to) the previously remembered bytes, then call acmStreamConvert (on 185+441 bytes, converting 2x256 and yielding a remainder of 114 bytes this time).
As a result, winmm constantly copies audio samples from mmdevapi. That what I called the mmdevapi-Winmm/ACM impedance mismatch in http://www.winehq.org/pipermail/wine-devel/2013-March/099179.html
So with capturing, as well as we faced in the past about mmdevapi rendering, Wine's winmm over mmdevapi is less performant than the pre-mmdevapi <= wine-1.3.24 code.
Perhaps MS' winmm is cheating. They may know that their packet buffers are adjacent (except at the end of the big mmdevapi buffer of course), observe that GetCurrentPadding shows that more than one packet is ready, then simply invoke acmStreamConvert once on all recorded samples instead of the packet-sized chunks.
Likewise cheating would be possible with rendering: DSound's primary buffer could equate mmdevapi's buffer, with DSound:Lock's buffer splitting being based on GetCurrentPadding modulo number of submitted frames modulo mmdevapi buffer size (if they don't use a hidden API that just gives them mmdevapi's write pointer). Past tests with print("%p",IARC_GetBuffer->data) showed that native appears to use one big mmdevapi buffer and one overflow/wrap-around one, much like winealsa/wineoss, unlike winecoreaudio.
Bug #28748, comment #1 is where Andrew Eikum discovered that apps write into DSound's buffer outside of Lock/Unlock pairs. :-( I recently came across a forum post from somebody who did the same with winmm:waveOutWrite, in an attempt to reduce latency by rewriting already submitted frames! (like PulseAudio does with ALSA, I've heard) IIRC, the guy observed that he could achieve 40ms latency with an XP system and 80ms with an winmm->mmdevapi (w7) system, which I take to mean that writing into already submitted headers less than 40-80ms ahead of GetPosition had no audible effect. I'm waiting for the time when we might discover that apps write directly into mmdevapi's buffer, outside Get/ReleaseBuffer pairs :-( But I digress. Luckily these days, many apps seem to use libraries like XA2 that hopefully are well behaved, instead of directly accessing the too-limited mmdevapi that only knows 48000 or 44100Hz, (like ALSA's fixed rate dmix plugin without resampler in front of it, who would use that?).