Hi Anton! I ran this MR through my tests and it looks pretty good! I wrote a couple of tools during my dsound work: a "loopback" one to record the output of the resampler when given an impulse or sine wave signal in input, to be able to study the output for distortion and such, and a "mixer" one which plays a bunch of audio buffers with different parameters (notably frequency) at the same time, to investigate performance - by manually looking at the CPU usage of the process from top :sweat_smile: I realize just now that I should probably clean them up a bit and make them available... Anyway, the loopback output looks good. The distortion I mentioned previously in !9588 is indeed gone (probably fixed by 0846c910ad4a31ad05bd5891d9c7e9ba92839241) and I don't see obvious new artifacts. As for performance, this is what I get with the mixer test with 128 buffers: - this one (FIR) SSE: 42% CPU usage - this one (FIR) non-SSE: 64% - !9588 (cubic) SSE: 27% - !9588 (cubic) non-SSE: 31% The numbers fluctuate quite a bit but this should be a reasonably fair representation of the relative performance. Giving it a look with perf, most of the time is spent inside the resampler's inner loop, symbol `upsample_sse.L2` (~35% of total system-wide time, according to the tool). After it `DSOUND_MixToPrimary` still takes about 13%, `putieee32` almost 5%, the rest (including other parts of the resampler) below that. It looks like my "general mixer" improvements should help a bit here as well, although proportionally they will make significantly less of an impact. It's possible that a handmade SSE version can squeeze a bit more performance out of it, although it's clear that this is largely up to the huge complexity difference between the two filtering algorithms. For reference, with !9588 `DSOUND_MixToPrimary` takes 24% of the CPU samples, `putieee32` is at 5.5% and `cubic_resample_sse2` only comes in 3rd at 5.25%. That shows that "mixing performance" in !9588 is dragged down by things other than the resampler and suggests that we can afford a slower resampler, up to a point. I haven't retested the game with this MR yet (I'll do it soon and report back) but my guess is that it's fast enough for our needs. I had only a quick look at the actual patches but they generally look very reasonable. From what I'd seen here, I don't think it's much of a big deal to avoid 64-bit integers, even on 32-bit, so maybe that part is mostly unnecessary. Actually in one of my followup patches I start storing the buffer "subsample" cursor position in fixed point, which allows some simplifications throughout the mixer. See https://gitlab.winehq.org/Mystral/wine/-/commit/aca8b39927dd75268cb18fe19307... for the general idea. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/9928#note_127392