Jinoh Kang (@iamahuman) commented about dlls/windows.media.speech/recognizer.c:
- if (FAILED(hr = IMMDevice_Activate(mm_device, &IID_IAudioClient, CLSCTX_INPROC_SERVER, NULL, (void**)&session->audio_client)))
goto cleanup;
- if (SUCCEEDED(hr = IMMDevice_GetId(mm_device, &str)))
- {
TRACE("selected capture device ID: %s\n", debugstr_w(str));
CoTaskMemFree(str);
- }
- if (FAILED(hr = IAudioClient_GetMixFormat(session->audio_client, (WAVEFORMATEX **)&wfx)))
goto cleanup;
- wfx->wFormatTag = WAVE_FORMAT_PCM;
- wfx->nChannels = 1;
- wfx->nSamplesPerSec = 16000;
Magic constant. You should replace this with a `#define` shared with Unix side interfacing vosk. `#define WINE_VOSK_SAMPLE_RATE 16000` will do.
(I'm aware that most vosk models are trained with 16kHz PCM streams.)