Here are my changes: !9928
I left a large comment over there which I'm not replicating here :sweat_smile: Good job though!
I also tried to add AVX+FMA3 versions of the resampling functions, but I'm getting diminishing returns. The bottleneck might be the indirect calls to `get` and `put` or the index calculation code in the outer loop.
I haven't tried myself but I suspect this is where the computational complexity of the current filter comes back to bite us. As much as you can make the arithmetic and data accesses faster, it's always going to be ~60 multiplications and additions fetching data from 2 "large" arrays per sample for a FIR / sinc filter vs one order of magnitude fewer arithmetic operations, accessing a single large input data array for a cubic one. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/9588#note_127395