[PATCH v11 0/32] MR9928: Draft: dsound: Speed up resampling.

28 Feb 2026

      This implements a number of optimizations, in particular:
- Swapping around the resampling loops in case of downsampling, allowing the FIR step to stay fixed regardless of the resampling ratio.
- Rearranging the FIR array elements to make the access sequential.
- Adding SSE versions of the resampling functions.

Together, these amount to more than a 5x reduction of `cp_fields_resample` execution time. The quality of the resampling should be the same, or even improve slightly, due to a more precise `rem` calculation and removal of the FIR step rounding, although I haven't yet conducted any measurements.

**UPDATE**

Added some more optimizations:
- Using fixed point math inside the resampling functions.
- Optimizing the SSE versions by hand.
- Adding AVX+FMA3 versions of the resampling functions.
- Getting and putting all channel samples in one go.

Combined with the previous ones, these bring the total speedup to 15x for upsampling and 13x for downsampling compared to the upstream.

--
  v11: dsound: Add a 32-bit SSE version of putsamples_mono2stereo.
       dsound: Add a 32-bit SSE version of putsamples_stereo.
       dsound: Add a 32-bit SSE2 version of getsamples16.
       dsound: Add seperate functions to resample interleaved stereo.
       dsound: Don't interleave the samples in the put functions.
       dsound: Don't deinterleave the samples in the get functions.
       dsound: Perform mixing in the put functions.
       dsound: Apply volume in the put functions.
       dsound: Specialize putsamples by channel count.
       dsound: Get all channel samples in one go.
       dsound: Put all channel samples in one go.
       dsound: Get rid of get_aux and call the functions directly.
       dsound: Get rid of put_aux and call the functions directly.
       dsound: Name the parameters of bitsgetfunc, bitsputfunc and normfunc.
       dsound: Add a 32-bit AVX+FMA3 version of downsample.
       dsound: Add a 32-bit AVX+FMA3 version of upsample.
       dsound: Add a 32-bit SSE version of downsample.
       dsound: Add a 32-bit SSE version of upsample.
       dsound: Use #define for fir.h constants.
       dsound: Make rem_num signed.
       dsound: Use a 0.32 fixed point to represent the resampling ratio.
       dsound: Replace multiplications by fir_step and fir_width with bit shifts.
       dsound: Premultiply the input value by firgain and the interpolation weights in downsample.
       dsound: Transpose the FIR array to make the element access sequential.
       dsound: Calculate firgain more accurately.
       dsound: Calculate required_input more accurately.
       dsound: Swap around the two nested loops in downsample.
       dsound: Don't invert the remainder twice in upsample.
       dsound: Use a fixed upsampling loop boundary.
       dsound: Don't pass dsbfirstep to upsample.
       dsound: Don't apply firgain in upsample.
       dsound: Split resample into separate downsample and upsample functions.

This merge request has too many patches to be relayed via email.
Please visit the URL below to see the contents of the merge request.
https://gitlab.winehq.org/wine/wine/-/merge_requests/9928

[PATCH v11 0/32] MR9928: Draft: dsound: Speed up resampling.

Anton Baskanov (＠baskanov)