On Sat Jan 24 06:47:40 2026 +0000, Matteo Bruni wrote:
Hi Anton! I ran this MR through my tests and it looks pretty good! I wrote a couple of tools during my dsound work: a "loopback" one to record the output of the resampler when given an impulse or sine wave signal in input, to be able to study the output for distortion and such, and a "mixer" one which plays a bunch of audio buffers with different parameters (notably frequency) at the same time, to investigate performance - by manually looking at the CPU usage of the process from top :sweat_smile: I realize just now that I should probably clean them up a bit and make them available... Anyway, the loopback output looks good. The distortion I mentioned previously in !9588 is indeed gone (probably fixed by 0846c910ad4a31ad05bd5891d9c7e9ba92839241) and I don't see obvious new artifacts. As for performance, this is what I get with the mixer test with 128 buffers: - this one (FIR) SSE: 42% CPU usage - this one (FIR) non-SSE: 64% - !9588 (cubic) SSE: 27% - !9588 (cubic) non-SSE: 31% The numbers fluctuate quite a bit but this should be a reasonably fair representation of the relative performance. Giving it a look with perf, most of the time is spent inside the resampler's inner loop, symbol `upsample_sse.L2` (~35% of total system-wide time, according to the tool). After it `DSOUND_MixToPrimary` still takes about 13%, `putieee32` almost 5%, the rest (including other parts of the resampler) below that. It looks like my "general mixer" improvements should help a bit here as well, although proportionally they will make significantly less of an impact. It's possible that a handmade SSE version can squeeze a bit more performance out of it, although it's clear that this is largely up to the huge complexity difference between the two filtering algorithms. For reference, with !9588 `DSOUND_MixToPrimary` takes 24% of the CPU samples, `putieee32` is at 5.5% and `cubic_resample_sse2` only comes in 3rd at 5.25%. That shows that "mixing performance" in !9588 is dragged down by things other than the resampler and suggests that we can afford a slower resampler, up to a point. I haven't retested the game with this MR yet (I'll do it soon and report back) but my guess is that it's fast enough for our needs. I had only a quick look at the actual patches but they generally look very reasonable. From what I'd seen here, I don't think it's much of a big deal to avoid 64-bit integers, even on 32-bit, so maybe that part is mostly unnecessary. Actually in one of my followup patches I start storing the buffer "subsample" cursor position in fixed point, which allows some simplifications throughout the mixer. See https://gitlab.winehq.org/Mystral/wine/-/commit/aca8b39927dd75268cb18fe19307... for the general idea. Hi Matteo!
I realize just now that I should probably clean them up a bit and make them available...
That would be great.
From what I'd seen here, I don't think it's much of a big deal to avoid 64-bit integers, even on 32-bit, so maybe that part is mostly unnecessary.
That's mainly to simplify the assembly, which is already quite hard to follow.
Actually in one of my followup patches I start storing the buffer "subsample" cursor position in fixed point, which allows some simplifications throughout the mixer.
Thanks for the idea. Fixed point might help eliminate the divisions in the outer loops of `downsample` and `upsample`. Although instead of 48.16, I'd go for 32.32, as this would make the resampling ratios more precise and also give a shorter assembly code. And in case of downsampling, I'd actually invert the fraction so that `freq_adjust_num` is fixed as we are dividing by `freq_adjust_num` there. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/9928#note_127902