Thanks Alexander. Thoughts below...
On Sat, May 19, 2012 at 09:09:35PM +0600, Alexander E. Patrakov wrote:
There are two ways to implement a high-performance resampler, and I have prepared (conflicting, pick no more than one) patches for both:
1 (this patch): Use a shorter FIR with the existing code. This has the advantage of higher quality (unwanted frequencies are at least attempted to be rejected) and almost no new code. 2 (the other patch): Write new code. E.g., linear interpolation. This is what Windows XP does at its lowest quality setting, and it eats less CPU than variant 1.
Do you have an opinion on which of these patches to use? The low-quality FIR has the advantage of not introducing another codepath. On the other hand, the linear resampler codepath is very simple, and even easier on the CPU.
I'm leaning towards the linear resampler for its larger CPU usage benefits.
Also note that, as evicenced by the debugging patch, a Core 2 Duo E6420 @ 2.13 GHz _can_ resample more than 32 streams simultaneously from various weird rates to 48000 Hz. As GTA:SA reportedly creates only 16 secondary buffers, it _should_ have more than enough CPU time to mix them. IMHO, this makes bug #30639 look somewhat strange: on GyB's computer, GTA:SA stutters, while Darwinia (which looks more demanding about sound) doesn't. It may well be that in fact none of my patches are needed, and that the real bug is that the CPU-intensive cp_fields() function is called from a wrong thread or process. I don't have the expertise needed to debug this.
I did some research on this. Darwinia creates up to 32 buffers, like you said. GTA:SA creates and destroys buffers as needed, and I saw it go as high as 31 in a quick test. Darwinia's buffer frequencies range in the 40-90 kHz range and resample to 22050 Hz, while GTA:SA's range around 10-20 kHz and resample to 48 kHz.
So in each time step, GTA:SA requires about 1000-2000 get_current_sample() calls, but 4800 FIR convolutions per buffer.
Darwinia requires 4000-9000 get_current_sample() calls, but only about 2200 convolutions per buffer.
I suspect the convolutions are considerably more expensive than the get_current_sample() calls, so I would actually expect GTA:SA to be more CPU taxing. That should explain what's going on here.
We could test this on Gyb's machine by setting DefaultSampleRate=22050 and hacking <dlls/dsound/primary.c:primarybuffer_SetFormat> to return S_OK without actually changing the primary buffer's format. That should give GTA:SA similar cp_fields performance to Darwinia, and I expect it would fix the lag issue.
Andrew
2012/5/22 Andrew Eikum aeikum@codeweavers.com:
Thanks Alexander. Thoughts below...
On Sat, May 19, 2012 at 09:09:35PM +0600, Alexander E. Patrakov wrote:
There are two ways to implement a high-performance resampler, and I have prepared (conflicting, pick no more than one) patches for both:
1 (this patch): Use a shorter FIR with the existing code. This has the advantage of higher quality (unwanted frequencies are at least attempted to be rejected) and almost no new code. 2 (the other patch): Write new code. E.g., linear interpolation. This is what Windows XP does at its lowest quality setting, and it eats less CPU than variant 1.
Do you have an opinion on which of these patches to use? The low-quality FIR has the advantage of not introducing another codepath. On the other hand, the linear resampler codepath is very simple, and even easier on the CPU.
Yes. And Windows has two code paths as well. OTOH the linear resampler has lower quality, and different latency from the FIR-based filter. Due to this difference of latency there may be unaviodable clicks in games (sorry, no concrete example) that frequently switch from 3 to 4 buffers and back. The FIR-based approach eliminates this effect, because there is no latency difference (or, because this is untested, better say: it's a bug in my code if there is any latency difference).
I'm leaning towards the linear resampler for its larger CPU usage benefits.
I have no real preference. If there are no other arguments (i.e. if clicks due to switching resamplers are not a valid/worthy argument), let's use the linear resampler, because I wrote it first and because GyB tested it. Anyway, it doesn't really matter, because it is possible to change this later, or even implement a 3-level quality degradation strategy (long FIR -> short FIR -> linear interpolation).
As for your performance analysis - yes, get_current_sample() is cheap, and the main cost is due to caching the FIR and calculating the convolution. As far as I understand (but I can be wrong here), it would be fair enough to count only the "sum += fir_copy[j] * cache[j];" line. Still, I don't think it explains the whole picture.
Let's say that the FIR length is X samples of the lowest of the two frequencies (X is a constant for a given FIR, and for my FIR it is 66). So, if upsampling, each output sample is affected by X input samples, and by X * freqAdjust input samples when downsampling. Since both GTA:SA and Darwinia use 32 buffers, we can consider only a single buffer and count the number of passes through the "sum" line per second.
Darwinia downsamples, has freqAdjust ~ 2..4. Thus, it executes the "sum" line 2..4 * X times per output sample, i.e. 40000..90000 * X times per second per input buffer.
GTA:SA upsamples, and thus executes the "sum" line X times per output sample, i.e. 48000 * X times per second per input buffer.
The ratio seems to be consistent with the number of convolutions per buffer per time step that you report, because the number of sampes per time step is different for these two games.
So I am not convinced with your analysis - but this is based on the assumption that only the "sum += fir_copy[j] * cache[j];" line really matters and that filling in fir_cache eats time proportional to the summing (it has to go through the same number of iterations).
2012/5/23 Alexander E. Patrakov patrakov@gmail.com:
Due to this difference of latency there may be unaviodable clicks in games (sorry, no concrete example) that frequently switch from 3 to 4 buffers and back. The FIR-based approach eliminates this effect, because there is no latency difference (or, because this is untested, better say: it's a bug in my code if there is any latency difference).
If your CPU is fast enough to play GTA:SA with full-quality resampler (I can't test because I don't have this game), you can test this clicking problem by setting the HQBuffersMax registry entry to something that is higher than the "normal" number of the concurrent buffers but lower than the peak number. When the threshold is crossed, something interesting (I guess, a click) should happen with the sound, and I also want to know what exactly :)