[PATCH v3 0/6] MR10716: dsound: Speed up resampling, part 6

newer
[PATCH v5 0/2] MR10797: cmd: Add a...

older
[PATCH v24 0/6] MR10515: Draft:...

Anton Baskanov (＠baskanov)

May 5, 2026

9:02 a.m.

-- v3: dsound: Add an SSE version of downsample. dsound: Add an SSE version of upsample. makedep: Allow adding arch-specific sources. https://gitlab.winehq.org/wine/wine/-/merge_requests/10716

Show replies by date

Anton Baskanov

May 2026

9:02 a.m.

New subject: [PATCH v3 1/6] dsound: Use DWORD to store freqAdjustNum, freqAdjustDen and freqAccNum.

From: Anton Baskanov <baskanov@gmail.com> --- dlls/dsound/dsound_private.h | 4 ++-- dlls/dsound/mixer.c | 24 ++++++++++++++---------- 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/dlls/dsound/dsound_private.h b/dlls/dsound/dsound_private.h index 75279dacf87..0e695698046 100644 --- a/dlls/dsound/dsound_private.h +++ b/dlls/dsound/dsound_private.h @@ -147,8 +147,8 @@ struct IDirectSoundBufferImpl DSBUFFERDESC dsbd; /* used for frequency conversion (PerfectPitch) */ float firgain; - LONG64 freqAdjustNum,freqAdjustDen; - LONG64 freqAccNum; + DWORD freqAdjustNum,freqAdjustDen; + DWORD freqAccNum; /* used for mixing */ DWORD sec_mixpos; /* Holds a copy of the next 'writelead' bytes, to be used for mixing. This makes it diff --git a/dlls/dsound/mixer.c b/dlls/dsound/mixer.c index 45e37c5bcb1..26a35722e50 100644 --- a/dlls/dsound/mixer.c +++ b/dlls/dsound/mixer.c @@ -106,7 +106,7 @@ void DSOUND_RecalcFormat(IDirectSoundBufferImpl *dsb) { DWORD ichannels = dsb->pwfx->nChannels; DWORD ochannels = dsb->device->pwfx->nChannels; - LONG64 oldFreqAdjustDen = dsb->freqAdjustDen; + DWORD oldFreqAdjustDen = dsb->freqAdjustDen; WAVEFORMATEXTENSIBLE *pwfxe; BOOL ieee = FALSE; @@ -131,7 +131,8 @@ void DSOUND_RecalcFormat(IDirectSoundBufferImpl *dsb) dsb->maxwritelead = (DSBFREQUENCY_MAX / 100) * dsb->pwfx->nBlockAlign; if (oldFreqAdjustDen) - dsb->freqAccNum = (dsb->freqAccNum * dsb->freqAdjustDen + oldFreqAdjustDen / 2) / oldFreqAdjustDen; + dsb->freqAccNum = (dsb->freqAccNum * (LONG64)dsb->freqAdjustDen + + oldFreqAdjustDen / 2) / oldFreqAdjustDen; dsb->get_aux = ieee ? getbpp[4] : getbpp[dsb->pwfx->wBitsPerSample/8 - 1]; dsb->put_aux = putieee32; @@ -419,17 +420,18 @@ static void upsample(DWORD freq_adjust_num, DWORD freq_acc_start, UINT count, fl * Note that this function will overwrite up to fir_width - 1 frames before and * after output[]. */ -static void resample(LONG64 freq_adjust_num, LONG64 freq_adjust_den, LONG64 freq_acc_start, +static void resample(DWORD freq_adjust_num, DWORD freq_adjust_den, DWORD freq_acc_start, float firgain, UINT required_input, UINT count, float *input, float *output) { if (freq_adjust_num > freq_adjust_den) { /* Take a reciprocal of the resampling ratio and convert it to a 0.32 * fixed point. Round down to prevent output buffer overflow. */ - DWORD freq_adjust_fixed_den = (freq_adjust_den << FREQ_ADJUST_SHIFT) / freq_adjust_num; + DWORD freq_adjust_fixed_den = ((LONG64)freq_adjust_den << FREQ_ADJUST_SHIFT) + / freq_adjust_num; /* Convert the subsample position to a 0.32 fixed point. Round up to * prevent output buffer overflow. */ - DWORD freq_acc_fixed_start = (freq_acc_start * freq_adjust_fixed_den + freq_adjust_den - 1) - / freq_adjust_den; + DWORD freq_acc_fixed_start = ((LONG64)freq_acc_start * freq_adjust_fixed_den + + freq_adjust_den - 1) / freq_adjust_den; memset(output, 0, count * sizeof(float)); downsample(freq_adjust_fixed_den, freq_acc_fixed_start, firgain, required_input, input, @@ -437,16 +439,18 @@ static void resample(LONG64 freq_adjust_num, LONG64 freq_adjust_den, LONG64 freq } else { /* Convert the resampling ratio to a 0.32 fixed point. Round down to * prevent input buffer overflow. */ - DWORD freq_adjust_fixed_num = (freq_adjust_num << FREQ_ADJUST_SHIFT) / freq_adjust_den; + DWORD freq_adjust_fixed_num = ((LONG64)freq_adjust_num << FREQ_ADJUST_SHIFT) + / freq_adjust_den; /* Convert the subsample position to a 0.32 fixed point. Round down to * prevent input buffer overflow. */ - DWORD freq_acc_fixed_start = (freq_acc_start << FREQ_ADJUST_SHIFT) / freq_adjust_den; + DWORD freq_acc_fixed_start = ((LONG64)freq_acc_start << FREQ_ADJUST_SHIFT) + / freq_adjust_den; upsample(freq_adjust_fixed_num, freq_acc_fixed_start, count, input, output); } } -static UINT cp_fields_resample(IDirectSoundBufferImpl *dsb, UINT count, LONG64 *freqAccNum) +static UINT cp_fields_resample(IDirectSoundBufferImpl *dsb, UINT count, DWORD *freqAccNum) { UINT i, channel; UINT istride = dsb->pwfx->nBlockAlign; @@ -517,7 +521,7 @@ static UINT cp_fields_resample(IDirectSoundBufferImpl *dsb, UINT count, LONG64 * return max_ipos; } -static void cp_fields(IDirectSoundBufferImpl *dsb, UINT count, LONG64 *freqAccNum) +static void cp_fields(IDirectSoundBufferImpl *dsb, UINT count, DWORD *freqAccNum) { DWORD ipos, adv; -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/10716

Anton Baskanov

9:02 a.m.

New subject: [PATCH v3 2/6] dsound: Move cp_fields_noresample after cp_fields_resample.

From: Anton Baskanov <baskanov@gmail.com> --- dlls/dsound/mixer.c | 56 ++++++++++++++++++++++----------------------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/dlls/dsound/mixer.c b/dlls/dsound/mixer.c index 26a35722e50..d66dec2ed7e 100644 --- a/dlls/dsound/mixer.c +++ b/dlls/dsound/mixer.c @@ -284,34 +284,6 @@ static inline float get_current_sample(const IDirectSoundBufferImpl *dsb, return dsb->get(dsb, buffer + (mixpos % buflen), channel); } -static UINT cp_fields_noresample(IDirectSoundBufferImpl *dsb, UINT count) -{ - UINT istride = dsb->pwfx->nBlockAlign; - UINT ostride = dsb->device->pwfx->nChannels * sizeof(float); - UINT committed_samples = 0; - DWORD channel, i; - - if (!secondarybuffer_is_audible(dsb)) - return count; - - if(dsb->use_committed) { - committed_samples = (dsb->writelead - dsb->committed_mixpos) / istride; - committed_samples = committed_samples <= count ? committed_samples : count; - } - - for (i = 0; i < committed_samples; i++) - for (channel = 0; channel < dsb->mix_channels; channel++) - dsb->put(dsb, i * ostride, channel, get_current_sample(dsb, dsb->committedbuff, - dsb->writelead, dsb->committed_mixpos + i * istride, channel)); - - for (; i < count; i++) - for (channel = 0; channel < dsb->mix_channels; channel++) - dsb->put(dsb, i * ostride, channel, get_current_sample(dsb, dsb->buffer->memory, - dsb->buflen, dsb->sec_mixpos + i * istride, channel)); - - return count; -} - /** * Note that this function will overwrite up to fir_width - 1 frames before and * after output[]. @@ -521,6 +493,34 @@ static UINT cp_fields_resample(IDirectSoundBufferImpl *dsb, UINT count, DWORD *f return max_ipos; } +static UINT cp_fields_noresample(IDirectSoundBufferImpl *dsb, UINT count) +{ + UINT istride = dsb->pwfx->nBlockAlign; + UINT ostride = dsb->device->pwfx->nChannels * sizeof(float); + UINT committed_samples = 0; + DWORD channel, i; + + if (!secondarybuffer_is_audible(dsb)) + return count; + + if(dsb->use_committed) { + committed_samples = (dsb->writelead - dsb->committed_mixpos) / istride; + committed_samples = committed_samples <= count ? committed_samples : count; + } + + for (i = 0; i < committed_samples; i++) + for (channel = 0; channel < dsb->mix_channels; channel++) + dsb->put(dsb, i * ostride, channel, get_current_sample(dsb, dsb->committedbuff, + dsb->writelead, dsb->committed_mixpos + i * istride, channel)); + + for (; i < count; i++) + for (channel = 0; channel < dsb->mix_channels; channel++) + dsb->put(dsb, i * ostride, channel, get_current_sample(dsb, dsb->buffer->memory, + dsb->buflen, dsb->sec_mixpos + i * istride, channel)); + + return count; +} + static void cp_fields(IDirectSoundBufferImpl *dsb, UINT count, DWORD *freqAccNum) { DWORD ipos, adv; -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/10716

Anton Baskanov

9:02 a.m.

New subject: [PATCH v3 3/6] dsound: Use #define for fir.h constants.

From: Anton Baskanov <baskanov@gmail.com> --- dlls/dsound/fir.h | 16 ++++++++-------- dlls/dsound/mixer.c | 42 +++++++++++++++++++++--------------------- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/dlls/dsound/fir.h b/dlls/dsound/fir.h index 45ad65d7398..76ac521e0f3 100644 --- a/dlls/dsound/fir.h +++ b/dlls/dsound/fir.h @@ -86,10 +86,10 @@ int main() fprintf(stderr, "q %f\n", (double)output.q); fprintf(stderr, "status %s\n", get_pm_status_str(output.status)); - printf("static const int fir_width_shift = %d;\n", fir_width_shift); - printf("static const int fir_width = %d;\n", fir_width); - printf("static const int fir_step_shift = %d;\n", fir_step_shift); - printf("static const int fir_step = %d;\n", fir_step); + printf("#define FIR_WIDTH_SHIFT %d\n", fir_width_shift); + printf("#define FIR_WIDTH %d\n", fir_width); + printf("#define FIR_STEP_SHIFT %d\n", fir_step_shift); + printf("#define FIR_STEP %d\n", fir_step); printf("static const float fir[] = {"); // Print the FIR array with an additional row at the end. This simplifies // calculation of the interpolated value by allowing the index to overflow @@ -114,10 +114,10 @@ int main() printf("};\n"); } */ -static const int fir_width_shift = 6; -static const int fir_width = 64; -static const int fir_step_shift = 7; -static const int fir_step = 128; +#define FIR_WIDTH_SHIFT 6 +#define FIR_WIDTH 64 +#define FIR_STEP_SHIFT 7 +#define FIR_STEP 128 static const float fir[] = { 0.0000000000e+00, -2.4830013102e-06, 1.9318705150e-06, 2.6614854151e-06, -1.5313785194e-05, 4.2076214553e-05, -9.1417167945e-05, 1.7455895136e-04, diff --git a/dlls/dsound/mixer.c b/dlls/dsound/mixer.c index d66dec2ed7e..1b4b1c7bd7a 100644 --- a/dlls/dsound/mixer.c +++ b/dlls/dsound/mixer.c @@ -285,7 +285,7 @@ static inline float get_current_sample(const IDirectSoundBufferImpl *dsb, } /** - * Note that this function will overwrite up to fir_width - 1 frames before and + * Note that this function will overwrite up to FIR_WIDTH - 1 frames before and * after output[]. */ static void downsample(DWORD freq_adjust_den, DWORD freq_acc_start, float firgain, @@ -309,28 +309,28 @@ static void downsample(DWORD freq_adjust_den, DWORD freq_acc_start, float firgai * Clearing the bits is safe as it has the same effect as rounding up the * resampling ratio and the subsample position and doesn't affect the * initial opos value. */ - LONG64 opos_num_mask = ~0ull << (FREQ_ADJUST_SHIFT - 23 - fir_step_shift); + LONG64 opos_num_mask = ~0ull << (FREQ_ADJUST_SHIFT - 23 - FIR_STEP_SHIFT); LONG64 opos_num = (freq_adjust_den - freq_acc_start + (1ll << FREQ_ADJUST_SHIFT) - 1) & opos_num_mask; DWORD opos_num_step = freq_adjust_den & (DWORD)opos_num_mask; /* Use XOR to invert the lower part of opos_num so that the lower bits * remain cleared. */ - float rem = FIXED_0_32_TO_FLOAT(((DWORD)opos_num ^ (DWORD)opos_num_mask) << fir_step_shift); - float rem_step = FIXED_0_32_TO_FLOAT(-opos_num_step << fir_step_shift); + float rem = FIXED_0_32_TO_FLOAT(((DWORD)opos_num ^ (DWORD)opos_num_mask) << FIR_STEP_SHIFT); + float rem_step = FIXED_0_32_TO_FLOAT(-opos_num_step << FIR_STEP_SHIFT); int j; for (j = 0; j < required_input; ++j) { /* opos is in the range [-(fir_width - 1), count) */ - int opos = (int)(opos_num >> FREQ_ADJUST_SHIFT) - fir_width; - UINT idx = ~(DWORD)opos_num >> (FREQ_ADJUST_SHIFT - fir_step_shift) << fir_width_shift; + int opos = (int)(opos_num >> FREQ_ADJUST_SHIFT) - FIR_WIDTH; + UINT idx = ~(DWORD)opos_num >> (FREQ_ADJUST_SHIFT - FIR_STEP_SHIFT) << FIR_WIDTH_SHIFT; float input_value = input[j] * firgain; float input_value0 = (1.0f - rem) * input_value; float input_value1 = rem * input_value; int i; - for (i = 0; i < fir_width; ++i) - output[opos + i] += fir[idx + i] * input_value0 + fir[idx + fir_width + i] * input_value1; + for (i = 0; i < FIR_WIDTH; ++i) + output[opos + i] += fir[idx + i] * input_value0 + fir[idx + FIR_WIDTH + i] * input_value1; rem += rem_step; rem -= rem >= 1.0f ? 1.0f : 0.0f; @@ -360,25 +360,25 @@ static void upsample(DWORD freq_adjust_num, DWORD freq_acc_start, UINT count, fl * * Clearing the bits is safe as it has the same effect as rounding down the * resampling ratio and the subsample position. */ - DWORD ipos_num_mask = ~0u << (FREQ_ADJUST_SHIFT - 23 - fir_step_shift); + DWORD ipos_num_mask = ~0u << (FREQ_ADJUST_SHIFT - 23 - FIR_STEP_SHIFT); LONG64 ipos_num = freq_acc_start & ipos_num_mask; DWORD ipos_num_step = freq_adjust_num & ipos_num_mask; - float rem_inv = FIXED_0_32_TO_FLOAT((DWORD)ipos_num << fir_step_shift); - float rem_inv_step = FIXED_0_32_TO_FLOAT(ipos_num_step << fir_step_shift); + float rem_inv = FIXED_0_32_TO_FLOAT((DWORD)ipos_num << FIR_STEP_SHIFT); + float rem_inv_step = FIXED_0_32_TO_FLOAT(ipos_num_step << FIR_STEP_SHIFT); UINT i; for(i = 0; i < count; ++i) { UINT ipos = ipos_num >> FREQ_ADJUST_SHIFT; - UINT idx = ~(DWORD)ipos_num >> (FREQ_ADJUST_SHIFT - fir_step_shift) << fir_width_shift; + UINT idx = ~(DWORD)ipos_num >> (FREQ_ADJUST_SHIFT - FIR_STEP_SHIFT) << FIR_WIDTH_SHIFT; float rem = 1.0f - rem_inv; int j; float sum = 0.0; float* cache = &input[ipos]; - for (j = 0; j < fir_width; j++) - sum += (fir[idx + j] * rem_inv + fir[idx + j + fir_width] * rem) * cache[j]; + for (j = 0; j < FIR_WIDTH; j++) + sum += (fir[idx + j] * rem_inv + fir[idx + j + FIR_WIDTH] * rem) * cache[j]; output[i] = sum; rem_inv += rem_inv_step; @@ -389,7 +389,7 @@ static void upsample(DWORD freq_adjust_num, DWORD freq_acc_start, UINT count, fl } /** - * Note that this function will overwrite up to fir_width - 1 frames before and + * Note that this function will overwrite up to FIR_WIDTH - 1 frames before and * after output[]. */ static void resample(DWORD freq_adjust_num, DWORD freq_adjust_den, DWORD freq_acc_start, @@ -435,15 +435,15 @@ static UINT cp_fields_resample(IDirectSoundBufferImpl *dsb, UINT count, DWORD *f UINT max_ipos = (freqAcc_start + count * dsb->freqAdjustNum) / dsb->freqAdjustDen; UINT required_input = max( - (freqAcc_start + (count - 1) * dsb->freqAdjustNum) / dsb->freqAdjustDen + fir_width, - (freqAcc_start + (count - 1 + fir_width) * dsb->freqAdjustNum) / dsb->freqAdjustDen); + (freqAcc_start + (count - 1) * dsb->freqAdjustNum) / dsb->freqAdjustDen + FIR_WIDTH, + (freqAcc_start + (count - 1 + FIR_WIDTH) * dsb->freqAdjustNum) / dsb->freqAdjustDen); float *intermediate, *output, *itmp; DWORD len = required_input * channels; /* Allocate an output buffer for each channel with padding on both ends as * required by the resample function. Padding at the end of one channel * buffer is reused as a start padding for the next channel buffer. */ - len += fir_width - 1 + (count + fir_width - 1) * channels; + len += FIR_WIDTH - 1 + (count + FIR_WIDTH - 1) * channels; len *= sizeof(float); *freqAccNum = freqAcc_end % dsb->freqAdjustDen; @@ -460,7 +460,7 @@ static UINT cp_fields_resample(IDirectSoundBufferImpl *dsb, UINT count, DWORD *f } intermediate = dsb->device->cp_buffer; - output = intermediate + required_input * channels + fir_width - 1; + output = intermediate + required_input * channels + FIR_WIDTH - 1; if(dsb->use_committed) { committed_samples = (dsb->writelead - dsb->committed_mixpos) / istride; @@ -484,11 +484,11 @@ static UINT cp_fields_resample(IDirectSoundBufferImpl *dsb, UINT count, DWORD *f for (channel = 0; channel < channels; channel++) resample(dsb->freqAdjustNum, dsb->freqAdjustDen, freqAcc_start, dsb->firgain, required_input, count, intermediate + channel * required_input, - output + channel * (fir_width - 1 + count)); + output + channel * (FIR_WIDTH - 1 + count)); for(i = 0; i < count; ++i) for (channel = 0; channel < channels; channel++) - dsb->put(dsb, i * ostride, channel, output[channel * (fir_width - 1 + count) + i]); + dsb->put(dsb, i * ostride, channel, output[channel * (FIR_WIDTH - 1 + count) + i]); return max_ipos; } -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/10716

Anton Baskanov

9:02 a.m.

New subject: [PATCH v3 4/6] makedep: Allow adding arch-specific sources.

From: Anton Baskanov <baskanov@gmail.com> --- tools/makedep.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/tools/makedep.c b/tools/makedep.c index 90a522640fe..2594f65a8ba 100644 --- a/tools/makedep.c +++ b/tools/makedep.c @@ -1723,12 +1723,13 @@ static void parse_file( struct makefile *make, struct incl_file *source, bool sr * * Add a source file to the list. */ -static struct incl_file *add_src_file( struct makefile *make, const char *name ) +static struct incl_file *add_src_file( struct makefile *make, const char *name, int arch ) { struct incl_file *file = xmalloc( sizeof(*file) ); memset( file, 0, sizeof(*file) ); file->name = xstrdup(name); + file->arch = arch; file->use_msvcrt = is_using_msvcrt( make ); list_add_tail( &make->sources, &file->entry ); if (make == include_makefile) @@ -3564,7 +3565,8 @@ static void output_source_one_arch( struct makefile *make, struct incl_file *sou if (strendswith( source->name, ".S" ) && is_subdir_other_arch( source->name, arch )) return; - obj_name = strmake( "%s%s.o", source->arch ? "" : arch_dirs[arch], obj ); + obj_name = strmake( "%s%s.o", + (source->file->flags & FLAG_GENERATED) && source->arch ? "" : arch_dirs[arch], obj ); strarray_add( targets, obj_name ); if (source->file->flags & FLAG_C_UNIX) @@ -3654,7 +3656,8 @@ static void output_source_one_arch( struct makefile *make, struct incl_file *sou if (sarif_converter && make->module && !make->external) { - const char *sast_name = strmake( "%s%s.sarif", source->arch ? "" : arch_dirs[arch], obj ); + const char *sast_name = strmake( "%s%s.sarif", + (source->file->flags & FLAG_GENERATED) && source->arch ? "" : arch_dirs[arch], obj ); output( "%s: %s\n", obj_dir_path( make, sast_name ), source->filename ); output( "\t%s%s -o $@ %s", cmd_prefix( "SAST" ), var_cc, source->filename ); output_filenames( defines ); @@ -4854,8 +4857,11 @@ static void load_sources( struct makefile *make ) list_init( &make->sources ); list_init( &make->includes ); - value = get_expanded_make_var_array( make, "SOURCES" ); - STRARRAY_FOR_EACH( file, &value ) add_src_file( make, file ); + for (arch = 0; arch < archs.count; arch++) + { + value = get_expanded_arch_var_array( make, "SOURCES", arch ); + STRARRAY_FOR_EACH( file, &value ) add_src_file( make, file, arch ); + } add_generated_sources( make ); -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/10716

Anton Baskanov

9:02 a.m.

New subject: [PATCH v3 5/6] dsound: Add an SSE version of upsample.

From: Anton Baskanov <baskanov@gmail.com> --- dlls/dsound/Makefile.in | 8 +++++ dlls/dsound/dsound_main.c | 8 +++++ dlls/dsound/dsound_private.h | 10 ++++++ dlls/dsound/fir.h | 10 ++++-- dlls/dsound/mixer.c | 10 +++++- dlls/dsound/mixer_sse.c | 65 ++++++++++++++++++++++++++++++++++++ 6 files changed, 108 insertions(+), 3 deletions(-) create mode 100644 dlls/dsound/mixer_sse.c diff --git a/dlls/dsound/Makefile.in b/dlls/dsound/Makefile.in index 1dd6dc2330c..a156d77b9a0 100644 --- a/dlls/dsound/Makefile.in +++ b/dlls/dsound/Makefile.in @@ -2,6 +2,8 @@ MODULE = dsound.dll IMPORTLIB = dsound IMPORTS = dxguid uuid winmm ole32 advapi32 user32 +mixer_sse_EXTRADEFS = -msse + VER_FILEDESCRIPTION_STR = "Wine DirectSound" VER_PRODUCTVERSION = 5,3,1,904 VER_OLESELFREGISTER = 1 @@ -18,3 +20,9 @@ SOURCES = \ primary.c \ propset.c \ sound3d.c + +i386_SOURCES = \ + mixer_sse.c + +x86_64_SOURCES = \ + mixer_sse.c diff --git a/dlls/dsound/dsound_main.c b/dlls/dsound/dsound_main.c index 8936b437ba2..c4dab2348e7 100644 --- a/dlls/dsound/dsound_main.c +++ b/dlls/dsound/dsound_main.c @@ -63,6 +63,8 @@ WINE_DEFAULT_DEBUG_CHANNEL(dsound); +BOOL sse_supported; + struct list DSOUND_renderers = LIST_INIT(DSOUND_renderers); CRITICAL_SECTION DSOUND_renderers_lock; static CRITICAL_SECTION_DEBUG DSOUND_renderers_lock_debug = @@ -82,6 +84,11 @@ GUID *DSOUND_capture_guids; /* All default settings, you most likely don't want to touch these, see wiki on UsefulRegistryKeys */ int ds_hel_buflen = 32768 * 2; +static void init_cpu_features(void) +{ + sse_supported = IsProcessorFeaturePresent(PF_XMMI_INSTRUCTIONS_AVAILABLE); +} + /* * Get a config key from either the app-specific or the default config */ @@ -787,6 +794,7 @@ BOOL WINAPI DllMain(HINSTANCE hInstDLL, DWORD fdwReason, LPVOID lpvReserved) DisableThreadLibraryCalls(hInstDLL); /* Increase refcount on dsound by 1 */ GetModuleHandleExW(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, (LPCWSTR)hInstDLL, &hInstDLL); + init_cpu_features(); break; case DLL_PROCESS_DETACH: if (lpvReserved) break; diff --git a/dlls/dsound/dsound_private.h b/dlls/dsound/dsound_private.h index 0e695698046..0ded79055e4 100644 --- a/dlls/dsound/dsound_private.h +++ b/dlls/dsound/dsound_private.h @@ -33,6 +33,7 @@ #include "wine/list.h" #define DS_MAX_CHANNELS 8 +#define FREQ_ADJUST_SHIFT 32 extern int ds_hel_buflen; @@ -251,6 +252,8 @@ HRESULT IDirectSoundCaptureImpl_Create(IUnknown *outer_unk, REFIID riid, void ** #define STATE_CAPTURING 2 #define STATE_STOPPING 3 +extern BOOL sse_supported; + extern CRITICAL_SECTION DSOUND_renderers_lock; extern struct list DSOUND_renderers; @@ -263,3 +266,10 @@ HRESULT get_mmdevice(EDataFlow flow, const GUID *tgt, IMMDevice **device); HRESULT enumerate_mmdevices(EDataFlow flow, GUID *guids, LPDSENUMCALLBACKW cb, void *user); + +/* mixer_sse.c */ + +#if defined(__i386__) || (defined(__x86_64__) && !defined(__arm64ec__)) +void upsample_sse(LONG64 ipos_num, DWORD ipos_num_step, float rem_inv_float, + float rem_inv_step_float, UINT count, float *input, float *output); +#endif diff --git a/dlls/dsound/fir.h b/dlls/dsound/fir.h index 76ac521e0f3..68fa4ecf484 100644 --- a/dlls/dsound/fir.h +++ b/dlls/dsound/fir.h @@ -90,7 +90,9 @@ int main() printf("#define FIR_WIDTH %d\n", fir_width); printf("#define FIR_STEP_SHIFT %d\n", fir_step_shift); printf("#define FIR_STEP %d\n", fir_step); - printf("static const float fir[] = {"); + printf("extern const float DECLSPEC_ALIGN(16) fir[];\n"); + printf("#ifdef FIR_IMPLEMENTATION\n"); + printf("const float DECLSPEC_ALIGN(16) fir[] = {"); // Print the FIR array with an additional row at the end. This simplifies // calculation of the interpolated value by allowing the index to overflow // into the extra row. It just repeats the first row, starting from its @@ -112,13 +114,16 @@ int main() printf("\n"); } printf("};\n"); + printf("#endif\n"); } */ #define FIR_WIDTH_SHIFT 6 #define FIR_WIDTH 64 #define FIR_STEP_SHIFT 7 #define FIR_STEP 128 -static const float fir[] = { +extern const float DECLSPEC_ALIGN(16) fir[]; +#ifdef FIR_IMPLEMENTATION +const float DECLSPEC_ALIGN(16) fir[] = { 0.0000000000e+00, -2.4830013102e-06, 1.9318705150e-06, 2.6614854151e-06, -1.5313785194e-05, 4.2076214553e-05, -9.1417167945e-05, 1.7455895136e-04, -3.0567859821e-04, 5.0191365396e-04, -7.8311909082e-04, 1.1713337628e-03, @@ -2312,3 +2317,4 @@ static const float fir[] = { 1.7455895136e-04, -9.1417167945e-05, 4.2076214553e-05, -1.5313785194e-05, 2.6614854151e-06, 1.9318705150e-06, -2.4830013102e-06, 0.0000000000e+00, }; +#endif diff --git a/dlls/dsound/mixer.c b/dlls/dsound/mixer.c index 1b4b1c7bd7a..7a1eddaf057 100644 --- a/dlls/dsound/mixer.c +++ b/dlls/dsound/mixer.c @@ -38,11 +38,12 @@ #include "ks.h" #include "ksmedia.h" #include "dsound_private.h" + +#define FIR_IMPLEMENTATION #include "fir.h" WINE_DEFAULT_DEBUG_CHANNEL(dsound); -#define FREQ_ADJUST_SHIFT 32 #define FIXED_0_32_TO_FLOAT(x) ((int)((x) >> 1) * (1.0f / (1ll << 31))) void DSOUND_RecalcVolPan(PDSVOLUMEPAN volpan) @@ -368,6 +369,13 @@ static void upsample(DWORD freq_adjust_num, DWORD freq_acc_start, UINT count, fl float rem_inv_step = FIXED_0_32_TO_FLOAT(ipos_num_step << FIR_STEP_SHIFT); UINT i; +#if defined(__i386__) || (defined(__x86_64__) && !defined(__arm64ec__)) + if (sse_supported) { + upsample_sse(ipos_num, ipos_num_step, rem_inv, rem_inv_step, count, input, output); + return; + } +#endif + for(i = 0; i < count; ++i) { UINT ipos = ipos_num >> FREQ_ADJUST_SHIFT; UINT idx = ~(DWORD)ipos_num >> (FREQ_ADJUST_SHIFT - FIR_STEP_SHIFT) << FIR_WIDTH_SHIFT; diff --git a/dlls/dsound/mixer_sse.c b/dlls/dsound/mixer_sse.c new file mode 100644 index 00000000000..62957233556 --- /dev/null +++ b/dlls/dsound/mixer_sse.c @@ -0,0 +1,65 @@ +/* SSE versions of DirectSound mixing routines + * + * Copyright 2026 Anton Baskanov + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +#include <xmmintrin.h> + +#include "windef.h" +#include "mmsystem.h" +#include "dsound.h" +#include "dsound_private.h" +#include "fir.h" + +void upsample_sse(LONG64 ipos_num, DWORD ipos_num_step, float rem_inv_float, + float rem_inv_step_float, UINT count, float *input, float *output) +{ + __m128 rem_inv = _mm_set1_ps(rem_inv_float); + __m128 rem_inv_step = _mm_set1_ps(rem_inv_step_float); + __m128 one = _mm_set1_ps(1.0f); + + UINT i; + + for(i = 0; i < count; ++i) { + UINT ipos = ipos_num >> FREQ_ADJUST_SHIFT; + UINT idx = ~(DWORD)ipos_num >> (FREQ_ADJUST_SHIFT - FIR_STEP_SHIFT) << FIR_WIDTH_SHIFT; + __m128 rem = _mm_sub_ps(one, rem_inv); + + int j; + __m128 sum = _mm_set1_ps(0.0f); + float* cache = &input[ipos]; + + for (j = 0; j < FIR_WIDTH; j += 4) { + __m128 fir_value0 = _mm_mul_ps(_mm_load_ps(&fir[idx + j]), rem_inv); + __m128 fir_value1 = _mm_mul_ps(_mm_load_ps(&fir[idx + j + FIR_WIDTH]), rem); + __m128 fir_value = _mm_add_ps(fir_value0, fir_value1); + __m128 input_value = _mm_loadu_ps(&cache[j]); + sum = _mm_add_ps(sum, _mm_mul_ps(fir_value, input_value)); + } + + /* Add the even-numbered sums to the odd-numbered ones. */ + sum = _mm_add_ps(sum, _mm_shuffle_ps(sum, sum, _MM_SHUFFLE(0, 3, 0, 1))); + /* Calculate the final sum and store it to the output array. */ + sum = _mm_add_ss(sum, _mm_movehl_ps(sum, sum)); + _mm_store_ss(&output[i], sum); + + rem_inv = _mm_add_ps(rem_inv, rem_inv_step); + rem_inv = _mm_sub_ps(rem_inv, _mm_and_ps(one, _mm_cmple_ps(one, rem_inv))); + + ipos_num += ipos_num_step; + } +} -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/10716

Anton Baskanov

9:02 a.m.

New subject: [PATCH v3 6/6] dsound: Add an SSE version of downsample.

From: Anton Baskanov <baskanov@gmail.com> --- dlls/dsound/dsound_private.h | 2 ++ dlls/dsound/mixer.c | 8 ++++++++ dlls/dsound/mixer_sse.c | 39 ++++++++++++++++++++++++++++++++++++ 3 files changed, 49 insertions(+) diff --git a/dlls/dsound/dsound_private.h b/dlls/dsound/dsound_private.h index 0ded79055e4..408e0104fb7 100644 --- a/dlls/dsound/dsound_private.h +++ b/dlls/dsound/dsound_private.h @@ -270,6 +270,8 @@ HRESULT enumerate_mmdevices(EDataFlow flow, GUID *guids, /* mixer_sse.c */ #if defined(__i386__) || (defined(__x86_64__) && !defined(__arm64ec__)) +void downsample_sse(LONG64 opos_num, DWORD opos_num_step, float rem_float, float rem_step_float, + float firgain_float, UINT required_input, float *input, float *output); void upsample_sse(LONG64 ipos_num, DWORD ipos_num_step, float rem_inv_float, float rem_inv_step_float, UINT count, float *input, float *output); #endif diff --git a/dlls/dsound/mixer.c b/dlls/dsound/mixer.c index 7a1eddaf057..a1809993d4e 100644 --- a/dlls/dsound/mixer.c +++ b/dlls/dsound/mixer.c @@ -320,6 +320,14 @@ static void downsample(DWORD freq_adjust_den, DWORD freq_acc_start, float firgai float rem_step = FIXED_0_32_TO_FLOAT(-opos_num_step << FIR_STEP_SHIFT); int j; +#if defined(__i386__) || (defined(__x86_64__) && !defined(__arm64ec__)) + if (sse_supported) { + downsample_sse(opos_num, opos_num_step, rem, rem_step, firgain, required_input, input, + output); + return; + } +#endif + for (j = 0; j < required_input; ++j) { /* opos is in the range [-(fir_width - 1), count) */ int opos = (int)(opos_num >> FREQ_ADJUST_SHIFT) - FIR_WIDTH; diff --git a/dlls/dsound/mixer_sse.c b/dlls/dsound/mixer_sse.c index 62957233556..0885051e57e 100644 --- a/dlls/dsound/mixer_sse.c +++ b/dlls/dsound/mixer_sse.c @@ -25,6 +25,45 @@ #include "dsound_private.h" #include "fir.h" +/** + * Note that this function will overwrite up to FIR_WIDTH - 1 frames before and + * after output[]. + */ +void downsample_sse(LONG64 opos_num, DWORD opos_num_step, float rem_float, float rem_step_float, + float firgain_float, UINT required_input, float *input, float *output) +{ + __m128 rem = _mm_set1_ps(rem_float); + __m128 rem_step = _mm_set1_ps(rem_step_float); + __m128 firgain = _mm_set_ss(firgain_float); + __m128 one = _mm_set1_ps(1.0f); + int j; + + for (j = 0; j < required_input; ++j) { + /* opos is in the range [-(fir_width - 1), count) */ + int opos = (int)(opos_num >> FREQ_ADJUST_SHIFT) - FIR_WIDTH; + UINT idx = ~(DWORD)opos_num >> (FREQ_ADJUST_SHIFT - FIR_STEP_SHIFT) << FIR_WIDTH_SHIFT; + __m128 rem_inv = _mm_sub_ps(one, rem); + + __m128 input_value_ss = _mm_mul_ss(_mm_load_ss(&input[j]), firgain); + __m128 input_value = _mm_shuffle_ps(input_value_ss, input_value_ss, 0); + __m128 input_value0 = _mm_mul_ps(rem_inv, input_value); + __m128 input_value1 = _mm_mul_ps(rem, input_value); + + int i; + for (i = 0; i < FIR_WIDTH; i += 4) { + __m128 value0 = _mm_mul_ps(_mm_load_ps(&fir[idx + i]), input_value0); + __m128 value1 = _mm_mul_ps(_mm_load_ps(&fir[idx + FIR_WIDTH + i]), input_value1); + __m128 value = _mm_add_ps(value0, value1); + _mm_storeu_ps(&output[opos + i], _mm_add_ps(_mm_loadu_ps(&output[opos + i]), value)); + } + + rem = _mm_add_ps(rem, rem_step); + rem = _mm_sub_ps(rem, _mm_and_ps(one, _mm_cmple_ps(one, rem))); + + opos_num += opos_num_step; + } +} + void upsample_sse(LONG64 ipos_num, DWORD ipos_num_step, float rem_inv_float, float rem_inv_step_float, UINT count, float *input, float *output) { -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/10716

Anton Baskanov (＠baskanov)

10:15 a.m.

On Tue May 5 02:41:45 2026 +0000, Anton Baskanov wrote:

...

If I understand correctly, doing it like this would mean SSE is disabled by default on 32-bit x86, and Wine would have to be recompiled with different `i386_CFLAGS` to enable it. Solved this by modifying makedep to allow adding architecture-specific sources.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/10716#note_138859

Anton Baskanov (＠baskanov)

10:15 a.m.

v3: - Rewrite SSE functions using compiler intrinsics. - Always define `sse_supported` regardless of the architecture. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10716#note_138860

Matteo Bruni (＠Mystral)

10:18 a.m.

On Tue May 5 10:15:32 2026 +0000, Anton Baskanov wrote:

...

Solved this by modifying makedep to allow adding architecture-specific sources. Right, I think that's proper. I would imagine that a number of distros (possibly the winehq.org packages as well) in practice use a `-march=` value that implies `-msse`, like the `-march=nocona` I mentioned earlier and that I think I had seen last time I went hunting for default flags. I'm having a hard time finding this info now though, so I can't really say for sure.

That said, we could certainly advocate for changing the default `i386_CFLAGS` in wine, either in general or just for the new wow64 mode, which implies amd64 and thus sse2 support. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10716#note_138862

Anton Baskanov (＠baskanov)

11:10 a.m.

Looks like `make_makefiles` needs to be updated to fix the builds. I'll look into this. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/10716#note_138863

Age (days ago)

Last active (days ago)

List overview

10 comments

3 participants

participants (3)

Anton Baskanov
Anton Baskanov (＠baskanov)
Matteo Bruni (＠Mystral)

[PATCH v3 0/6] MR10716: dsound: Speed up resampling, part 6

tags

participants (3)