2012/5/22 Andrew Eikum aeikum@codeweavers.com:
Thanks Alexander. Thoughts below...
On Sat, May 19, 2012 at 09:09:35PM +0600, Alexander E. Patrakov wrote:
There are two ways to implement a high-performance resampler, and I have prepared (conflicting, pick no more than one) patches for both:
1 (this patch): Use a shorter FIR with the existing code. This has the advantage of higher quality (unwanted frequencies are at least attempted to be rejected) and almost no new code. 2 (the other patch): Write new code. E.g., linear interpolation. This is what Windows XP does at its lowest quality setting, and it eats less CPU than variant 1.
Do you have an opinion on which of these patches to use? The low-quality FIR has the advantage of not introducing another codepath. On the other hand, the linear resampler codepath is very simple, and even easier on the CPU.
Yes. And Windows has two code paths as well. OTOH the linear resampler has lower quality, and different latency from the FIR-based filter. Due to this difference of latency there may be unaviodable clicks in games (sorry, no concrete example) that frequently switch from 3 to 4 buffers and back. The FIR-based approach eliminates this effect, because there is no latency difference (or, because this is untested, better say: it's a bug in my code if there is any latency difference).
I'm leaning towards the linear resampler for its larger CPU usage benefits.
I have no real preference. If there are no other arguments (i.e. if clicks due to switching resamplers are not a valid/worthy argument), let's use the linear resampler, because I wrote it first and because GyB tested it. Anyway, it doesn't really matter, because it is possible to change this later, or even implement a 3-level quality degradation strategy (long FIR -> short FIR -> linear interpolation).
As for your performance analysis - yes, get_current_sample() is cheap, and the main cost is due to caching the FIR and calculating the convolution. As far as I understand (but I can be wrong here), it would be fair enough to count only the "sum += fir_copy[j] * cache[j];" line. Still, I don't think it explains the whole picture.
Let's say that the FIR length is X samples of the lowest of the two frequencies (X is a constant for a given FIR, and for my FIR it is 66). So, if upsampling, each output sample is affected by X input samples, and by X * freqAdjust input samples when downsampling. Since both GTA:SA and Darwinia use 32 buffers, we can consider only a single buffer and count the number of passes through the "sum" line per second.
Darwinia downsamples, has freqAdjust ~ 2..4. Thus, it executes the "sum" line 2..4 * X times per output sample, i.e. 40000..90000 * X times per second per input buffer.
GTA:SA upsamples, and thus executes the "sum" line X times per output sample, i.e. 48000 * X times per second per input buffer.
The ratio seems to be consistent with the number of convolutions per buffer per time step that you report, because the number of sampes per time step is different for these two games.
So I am not convinced with your analysis - but this is based on the assumption that only the "sum += fir_copy[j] * cache[j];" line really matters and that filling in fir_cache eats time proportional to the summing (it has to go through the same number of iterations).