Re: [PATCH v2 0/28] MR9928: Draft: dsound: Speed up resampling.

Jan. 24, 2026


      On Sat Jan 24 06:47:40 2026 +0000, Matteo Bruni wrote:
...
Hi Anton! I ran this MR through my tests and it looks pretty good!
I wrote a couple of tools during my dsound work: a "loopback" one to
record the output of the resampler when given an impulse or sine wave
signal in input, to be able to study the output for distortion and such,
and a "mixer" one which plays a bunch of audio buffers with different
parameters (notably frequency) at the same time, to investigate
performance - by manually looking at the CPU usage of the process from
top :sweat_smile: 
I realize just now that I should probably clean them up a bit and make
them available...
Anyway, the loopback output looks good. The distortion I mentioned
previously in !9588 is indeed gone (probably fixed by
0846c910ad4a31ad05bd5891d9c7e9ba92839241) and I don't see obvious new
artifacts. As for performance, this is what I get with the mixer test
with 128 buffers:
- this one (FIR) SSE: 42% CPU usage
- this one (FIR) non-SSE: 64%
- !9588 (cubic) SSE: 27%
- !9588 (cubic) non-SSE: 31%
The numbers fluctuate quite a bit but this should be a reasonably fair
representation of the relative performance.
Giving it a look with perf, most of the time is spent inside the
resampler's inner loop, symbol `upsample_sse.L2` (~35% of total
system-wide time, according to the tool). After it `DSOUND_MixToPrimary`
still takes about 13%, `putieee32` almost 5%, the rest (including other
parts of the resampler) below that. It looks like my "general mixer"
improvements should help a bit here as well, although proportionally
they will make significantly less of an impact. It's possible that a
handmade SSE version can squeeze a bit more performance out of it,
although it's clear that this is largely up to the huge complexity
difference between the two filtering algorithms.
For reference, with !9588 `DSOUND_MixToPrimary` takes 24% of the CPU
samples, `putieee32` is at 5.5% and `cubic_resample_sse2` only comes in
3rd at 5.25%. That shows that "mixing performance" in !9588 is dragged
down by things other than the resampler and suggests that we can afford
a slower resampler, up to a point.
I haven't retested the game with this MR yet (I'll do it soon and report
back) but my guess is that it's fast enough for our needs. I had only a
quick look at the actual patches but they generally look very
reasonable. 
From what I'd seen here, I don't think it's much of a big deal to avoid
64-bit integers, even on 32-bit, so maybe that part is mostly
unnecessary. Actually in one of my followup patches I start storing the
buffer "subsample" cursor position in fixed point, which allows some
simplifications throughout the mixer. See
https://gitlab.winehq.org/Mystral/wine/-/commit/aca8b39927dd75268cb18fe19307...
for the general idea.
Hi Matteo!
...
I realize just now that I should probably clean them up a bit and make them available...
That would be great.
...
From what I'd seen here, I don't think it's much of a big deal to avoid 64-bit integers, even on 32-bit, so maybe that part is mostly unnecessary.
That's mainly to simplify the assembly, which is already quite hard to follow.
...
Actually in one of my followup patches I start storing the buffer "subsample" cursor position in fixed point, which allows some simplifications throughout the mixer.
Thanks for the idea. Fixed point might help eliminate the divisions in the outer loops of `downsample` and `upsample`. Although instead of 48.16, I'd go for 32.32, as this would make the resampling ratios more precise and also give a shorter assembly code. And in case of downsampling, I'd actually invert the fraction so that `freq_adjust_num` is fixed as we are dividing by `freq_adjust_num` there.

-- 
https://gitlab.winehq.org/wine/wine/-/merge_requests/9928#note_127902

Re: [PATCH v2 0/28] MR9928: Draft: dsound: Speed up resampling.

Anton Baskanov (＠baskanov)