Hi Jacek,
On 8/9/22 18:18, Jacek Caban (@jacek) wrote:
Jacek Caban (@jacek) commented about dlls/kernelbase/path.c:
{ INT ih; WCHAR buf[5] = L"0x";
memcpy(buf + 2, src + 1, 2*sizeof(WCHAR)); buf[4] = 0; StrToIntExW(buf, STIF_SUPPORT_HEX, &ih);
next = (WCHAR) ih; src += 2; /* Advance to end of escape */
if (flags & URL_UNESCAPE_AS_UTF8)
{
utf8_buf[utf8_len++] = ih;
utf16_len = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, utf8_buf, utf8_len, NULL, 0);
if (!utf16_len)
continue;
This doesn't seem reliable. For example, if there is non-escaped char between escaped multi-byte values, you will end up combining characters surrounding non-escaped one. See JSGlobal_decodeURI for an example how it can be handled.
Sorry for the long delay, it has been really a good while! The last time I tried the approach in JSGlobal_decodeURI() but I found that it doesn't handle 4-bytes UTF-8 very well. So I hung this up.
Anyway, this comes to my sight again recently. In this try, I use get_utf8_len() and the first byte of the UTF-8 code for calculating the length of the UTF-8 code. Hopefully, this can handle the 'non-escaped characters between multi-byte escaped characters' case and 4 bytes UTF-8. These cases are added to the test correspondingly.
Thanks