[PATCH v5 0/33] MR9928: Draft: dsound: Speed up resampling.
This implements a number of optimizations, in particular: - Swapping around the resampling loops in case of downsampling, allowing the FIR step to stay fixed regardless of the resampling ratio. - Rearranging the FIR array elements to make the access sequential. - Adding SSE versions of the resampling functions. Together, these amount to more than a 5x reduction of `cp_fields_resample` execution time. The quality of the resampling should be the same, or even improve slightly, due to a more precise `rem` calculation and removal of the FIR step rounding, although I haven't yet conducted any measurements. **UPDATE** Added some more optimizations: - Using fixed point math inside the resampling functions. - Optimizing the SSE versions by hand. - Adding AVX+FMA3 versions of the resampling functions. - Getting and putting all channel samples in one go. Combined with the previous ones, these bring the total speedup to 15x for upsampling and 13x for downsampling compared to the upstream. -- v5: dsound: Get all channel samples in one go. dsound: Put all channel samples in one go. dsound: Get rid of get_aux and call the functions directly. dsound: Get rid of put_aux and call the functions directly. dsound: Add a 32-bit AVX+FMA3 version of downsample. dsound: Add a 32-bit AVX+FMA3 version of upsample. dsound: Add a 32-bit SSE version of downsample. dsound: Add a 32-bit SSE version of upsample. This merge request has too many patches to be relayed via email. Please visit the URL below to see the contents of the merge request. https://gitlab.winehq.org/wine/wine/-/merge_requests/9928
v5: - Only call `__cpuid` on x86. Should fix the arm64 build. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/9928#note_129024
I did test the MR a bit with the game I wrote !9588 for. It's certainly a large improvement but I don't feel like it's rock solid in the "for sure it's not going to be an issue anymore" territory.
I've pushed a new version, which should be much faster. Maybe that's going to solve it?
I'll have a proper look later but I suspect that a 64-tap FIR filter will have a hard time matching the performance of the native 4-tap filter, no matter how optimized it will be. That's not a problem though, see below.
As I mentioned in https://gitlab.winehq.org/wine/wine/-/merge_requests/9588#note_127395, the FIR we are currently using is very complex. I'm convinced it's too complex, in fact.
I don't think it's too complex, at 44.1 kHz sampling rate 64-tap is the smallest FIR that provides a good stopband attenuation while keeping the transition band above 20 kHz. It's not even considered "high quality" by modern standards.
Right, not especially complex when discussing generic audio resampling filters. For reference, https://src.infinitewave.ca/ has a lot of test results on many audio resamplers used by a plethora of software. Things are quite special for dsound, where you want to resample tens or even hundreds of buffers in real time, often with different and changing sample rates, and use as little CPU time as possible. The current native filter is very likely a sinc filter (see e.g. [https://ccrma.stanford.edu/~jos/pasp/Implementation.html](https://ccrma.stanford.edu/~jos/pasp/Theory_Ideal_Bandlimited_Interpolation.html) for an explanation of the idea). I'll point out that the resampler implementation for a FIR filter and for a sinc one is effectively identical i.e. a convolver. That means that we could swap out the FIR filter with a sinc one at any point and these improvements to the resampler will still be 100% useful even afterwards.
Looking at the dsound impulse response on Win10 (e.g. by running "loopback i" and opening the capture.wav file on Audacity) you can see that 8 output samples are non-0 for each impulse, and they're shaped like the first 2 lobes of a sinc i.e. they're very likely using a 4-tap sinc filter.
Wow, I didn't know that. It explains why there are games that require fast resampling. MS probably changed the resampling algorithm in Vista and the games were released after that without proper testing on XP.
FWIW the dsound.dll installed by winetricks (probably from Win98 times or so) uses linear resampling, so it's likely that games started depending on very fast performance, and accepting the consequent low quality, from the start. At any rate, I am (already was, actually :grinning:) okay with improving the current resampler in this vein, in place of !9588. I'm going to close it and resend the first couple of patches (which are independent anyway) separately. Looking forward to the MRs! -- https://gitlab.winehq.org/wine/wine/-/merge_requests/9928#note_129027
participants (2)
-
Anton Baskanov (@baskanov) -
Matteo Bruni (@Mystral)