https://bugs.winehq.org/show_bug.cgi?id=56994
Bug ID: 56994 Summary: mbstowcs for UTF8, with an exact (not overallocated) output buffer size fails Product: Wine Version: unspecified Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: msvcrt Assignee: wine-bugs@winehq.org Reporter: martin@martin.st CC: piotr@codeweavers.com Distribution: ---
This issue can be illustrated with the following test snippet:
#include <stdio.h> #include <locale.h> #include <stdlib.h> #include <wchar.h>
int main() { const char *locname = "en_US.UTF-8"; if (!setlocale(LC_ALL, locname)) { fprintf(stderr, "locale %s failed\n", locname); return 1; }
wchar_t wcsbuf[10] = { 0xe9, 0 }; // LATIN SMALL LETTER E WITH ACUTE char mbsbuf[10] = { 0 }; size_t ret; ret = wcstombs(NULL, wcsbuf, 0); printf("initial wcstombs returned %d bytes\n", (int) ret); ret = wcstombs(mbsbuf, wcsbuf, ret); printf("wcstombs returned %d bytes\n", (int) ret); if (ret == -1) return 1; for (size_t i = 0; i < ret; i++) printf("%02x ", (unsigned char)mbsbuf[i]); printf("\n"); ret = mbstowcs(NULL, mbsbuf, 0); printf("initial mbstowcs returned %d wchars\n", (int) ret); if (ret > sizeof(wcsbuf)/sizeof(wcsbuf[0])) return 1; ret = mbstowcs(wcsbuf, mbsbuf, ret); printf("mbstowcs returned %d wchars\n", (int) ret); if (ret == -1) return 1; for (size_t i = 0; i < ret; i++) printf("%02x ", (unsigned int)wcsbuf[i]); printf("\n");
return 0; }
Compiled with mingw tools targeting UCRT.
On native Windows, this outputs: initial wcstombs returned 2 bytes wcstombs returned 2 bytes c3 a9 initial mbstowcs returned 1 wchars mbstowcs returned 1 wchars e9
With Wine, it outputs: initial wcstombs returned 2 bytes wcstombs returned 2 bytes c3 a9 initial mbstowcs returned 1 wchars mbstowcs returned -1 wchars
Thus, the actual mbstowcs conversion fails, whereas it succeds on native Windows.
The problem here is that the output buffer size, given to mbstowcs, is a tightly allocated 1.
When scanning the (null terminated) input buffer, to calculate the input size, at https://gitlab.winehq.org/wine/wine/-/blob/wine-9.13/dlls/msvcrt/mbcs.c?ref_..., we limit the loop to "i<count". As we've passed count=1, as we know the output is going to be 1 wchar, this loop terminates after one iteration.
The check "_isleadbyte_l((unsigned char)mbstr[size], locale)" seems to fail for the leading UTF8 byte, which probably is a bug in itself. But if mbstowcs is given an overallocated output size (passing a larger count parameter), that issue isn't visible here.
https://bugs.winehq.org/show_bug.cgi?id=56994
Fabian Maurer dark.shadow4@web.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |dark.shadow4@web.de
https://bugs.winehq.org/show_bug.cgi?id=56994
Piotr Caban piotr.caban@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |piotr.caban@gmail.com
--- Comment #1 from Piotr Caban piotr.caban@gmail.com --- I've created https://gitlab.winehq.org/wine/wine/-/merge_requests/6141 to address that.
While looking on it I have also checked few other functions to find out that they don't account for UTF-8. While I haven't tested it well it looks like isleadbyte is currently working as in native implementation.
https://bugs.winehq.org/show_bug.cgi?id=56994
Piotr Caban piotr.caban@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Fixed by SHA1| |2c0886257a79cf887d7cf5dce79 | |293ad96668157 Resolution|--- |FIXED
--- Comment #2 from Piotr Caban piotr.caban@gmail.com --- The attached test case works now. Marking as fixed.
https://bugs.winehq.org/show_bug.cgi?id=56994
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #3 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 9.14.