[PATCH v4 0/33] MR9928: Draft: dsound: Speed up resampling.
This implements a number of optimizations, in particular: - Swapping around the resampling loops in case of downsampling, allowing the FIR step to stay fixed regardless of the resampling ratio. - Rearranging the FIR array elements to make the access sequential. - Adding SSE versions of the resampling functions. Together, these amount to more than a 5x reduction of `cp_fields_resample` execution time. The quality of the resampling should be the same, or even improve slightly, due to a more precise `rem` calculation and removal of the FIR step rounding, although I haven't yet conducted any measurements. -- v4: dsound: Get all channel samples in one go. dsound: Put all channel samples in one go. dsound: Get rid of get_aux and call the functions directly. dsound: Get rid of put_aux and call the functions directly. dsound: Add a 32-bit AVX+FMA3 version of downsample. dsound: Add a 32-bit AVX+FMA3 version of upsample. dsound: Add a 32-bit SSE version of downsample. dsound: Add a 32-bit SSE version of upsample. dsound: Use #define for fir.h contants. dsound: Use a 0.32 fixed point number to represent the resampling ratio. dsound: Replace multiplications by fir_step and fir_width with bit shifts. dsound: Premultiply the input value by firgain and the interpolation weights in downsample. dsound: Transpose the FIR array to make the element access sequential. dsound: Calculate firgain more accurately. dsound: Calculate required_input more accurately. dsound: Swap around the two nested loops in downsample. dsound: Don't invert the remainder twice in upsample. dsound: Use a fixed upsampling loop boundary. dsound: Don't pass dsbfirstep to upsample. dsound: Don't apply firgain in upsample. dsound: Split resample into separate downsample and upsample functions. dsound: Factor out resampling. dsound: Remove asserts from the resampling loop. dsound: Resample into a temporary buffer. dsound: Resample one channel at a time. dsound: Get rid of fir_copy. dsound: Use signed int to calculate indices during resampling. dsound: Multiply by dsbfirstep after calculating the modulus. dsound: Use the modulus operator instead of divide-multiply-subtract. dsound: Do the subtraction before converting to float to improve rem precision. dsound: Don't use double-precision arithmetic in the resampler. dsound: Remove the unused freqneeded field. dsound: Use a better FIR filter generated with Parks-McClellan algorithm. This merge request has too many patches to be relayed via email. Please visit the URL below to see the contents of the merge request. https://gitlab.winehq.org/wine/wine/-/merge_requests/9928
v4: - Use a better FIR filter. - Use fixed point math inside the resampling functions. - Replace the generated SSE code with hand-optimized assembly. - Add AVX+FMA3 versions of the resampling functions. - Get and put all channel samples in one go. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/9928#note_129007
I realise this is still marked as draft, but it seems to be doing in roughly the right direction. Feel free to split the first six or so commits off to a separate MR so that we can start getting these in. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/9928#note_129009
I did test the MR a bit with the game I wrote !9588 for. It's certainly a large improvement but I don't feel like it's rock solid in the "for sure it's not going to be an issue anymore" territory.
I've pushed a new version, which should be much faster. Maybe that's going to solve it?
I pushed them to https://gitlab.winehq.org/Mystral/audio-test-tools. They're still not especially pretty, but they should get the job done.
Thanks, these are very helpful.
As I mentioned in https://gitlab.winehq.org/wine/wine/-/merge_requests/9588#note_127395, the FIR we are currently using is very complex. I'm convinced it's too complex, in fact.
I don't think it's too complex, at 44.1 kHz sampling rate 64-tap is the smallest FIR that provides a good stopband attenuation while keeping the transition band above 20 kHz. It's not even considered "high quality" by modern standards.
Looking at the dsound impulse response on Win10 (e.g. by running "loopback i" and opening the capture.wav file on Audacity) you can see that 8 output samples are non-0 for each impulse, and they're shaped like the first 2 lobes of a sinc i.e. they're very likely using a 4-tap sinc filter.
Wow, I didn't know that. It explains why there are games that require fast resampling. MS probably changed the resampling algorithm in Vista and the games were released after that without proper testing on XP. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/9928#note_129022
participants (2)
-
Anton Baskanov (@baskanov) -
Huw Davies (@huw)