https://bugs.winehq.org/show_bug.cgi?id=56994
Bug ID: 56994 Summary: mbstowcs for UTF8, with an exact (not overallocated) output buffer size fails Product: Wine Version: unspecified Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: msvcrt Assignee: wine-bugs@winehq.org Reporter: martin@martin.st CC: piotr@codeweavers.com Distribution: ---
This issue can be illustrated with the following test snippet:
#include <stdio.h> #include <locale.h> #include <stdlib.h> #include <wchar.h>
int main() { const char *locname = "en_US.UTF-8"; if (!setlocale(LC_ALL, locname)) { fprintf(stderr, "locale %s failed\n", locname); return 1; }
wchar_t wcsbuf[10] = { 0xe9, 0 }; // LATIN SMALL LETTER E WITH ACUTE char mbsbuf[10] = { 0 }; size_t ret; ret = wcstombs(NULL, wcsbuf, 0); printf("initial wcstombs returned %d bytes\n", (int) ret); ret = wcstombs(mbsbuf, wcsbuf, ret); printf("wcstombs returned %d bytes\n", (int) ret); if (ret == -1) return 1; for (size_t i = 0; i < ret; i++) printf("%02x ", (unsigned char)mbsbuf[i]); printf("\n"); ret = mbstowcs(NULL, mbsbuf, 0); printf("initial mbstowcs returned %d wchars\n", (int) ret); if (ret > sizeof(wcsbuf)/sizeof(wcsbuf[0])) return 1; ret = mbstowcs(wcsbuf, mbsbuf, ret); printf("mbstowcs returned %d wchars\n", (int) ret); if (ret == -1) return 1; for (size_t i = 0; i < ret; i++) printf("%02x ", (unsigned int)wcsbuf[i]); printf("\n");
return 0; }
Compiled with mingw tools targeting UCRT.
On native Windows, this outputs: initial wcstombs returned 2 bytes wcstombs returned 2 bytes c3 a9 initial mbstowcs returned 1 wchars mbstowcs returned 1 wchars e9
With Wine, it outputs: initial wcstombs returned 2 bytes wcstombs returned 2 bytes c3 a9 initial mbstowcs returned 1 wchars mbstowcs returned -1 wchars
Thus, the actual mbstowcs conversion fails, whereas it succeds on native Windows.
The problem here is that the output buffer size, given to mbstowcs, is a tightly allocated 1.
When scanning the (null terminated) input buffer, to calculate the input size, at https://gitlab.winehq.org/wine/wine/-/blob/wine-9.13/dlls/msvcrt/mbcs.c?ref_..., we limit the loop to "i<count". As we've passed count=1, as we know the output is going to be 1 wchar, this loop terminates after one iteration.
The check "_isleadbyte_l((unsigned char)mbstr[size], locale)" seems to fail for the leading UTF8 byte, which probably is a bug in itself. But if mbstowcs is given an overallocated output size (passing a larger count parameter), that issue isn't visible here.