IWinInetHttpInfo_QueryInfo returns a multibyte string, not a wide string. We were also wrongly expecting it to have a NUL terminator.
* * *
this one doesn't feel good. IWinInetHttpInfo_QueryInfo calls HttpInfo_QueryInfo (urlmon), which calls HttpQueryInfoA (wininet), so it returns char*. but inside, HttpQueryInfoA is calling HttpQueryInfoW and does charset conversion. so we are converting the string to multibyte and back.
maybe there's an API that returns wchar* i don't know about.
-- v2: mshtml: Fix misuse of IWinInetHttpInfo_QueryInfo.
From: Yuxuan Shui yshui@codeweavers.com
IWinInetHttpInfo_QueryInfo returns a multibyte string, not a wide string. We were also wrongly expecting it to have a NUL terminator. --- dlls/mshtml/navigate.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/dlls/mshtml/navigate.c b/dlls/mshtml/navigate.c index 6b5eff15105..7e267975916 100644 --- a/dlls/mshtml/navigate.c +++ b/dlls/mshtml/navigate.c @@ -813,8 +813,9 @@ static HRESULT process_response_headers(nsChannelBSC *This, const WCHAR *headers static void query_http_info(nsChannelBSC *This, IWinInetHttpInfo *wininet_info) { const WCHAR *ptr; - DWORD len = 0; - WCHAR *buf; + DWORD len = 0, wlen; + WCHAR *wbuf; + char *buf;
IWinInetHttpInfo_QueryInfo(wininet_info, HTTP_QUERY_RAW_HEADERS_CRLF, NULL, &len, NULL, NULL); if(!len) @@ -825,17 +826,29 @@ static void query_http_info(nsChannelBSC *This, IWinInetHttpInfo *wininet_info) return;
IWinInetHttpInfo_QueryInfo(wininet_info, HTTP_QUERY_RAW_HEADERS_CRLF, buf, &len, NULL, NULL); - if(!len) { - free(buf); - return; - } + if(!len) + goto out; + + + wlen = MultiByteToWideChar(CP_ACP, 0, buf, len, NULL, 0); + if (!wlen) + goto out; + + wbuf = malloc((wlen + 1) * sizeof(WCHAR)); + if (!wbuf) + goto out; + + MultiByteToWideChar(CP_ACP, 0, buf, len, wbuf, wlen); + wbuf[wlen] = L'\0';
- ptr = wcschr(buf, '\r'); + ptr = wcschr(wbuf, '\r'); if(ptr && ptr[1] == '\n') { ptr += 2; process_response_headers(This, ptr); } + free(wbuf);
+out: free(buf); }
update: fixed the new out-of-bound i just added.
It indeed broke with 216ad43f18fdf. The change looks correct, but since the string is null-terminated, you could simplify it by using `strdupAtoW`.
Regarding the codepage, I suspect that what the documentation actually means is that wininet returns the string exactly as it was received from the server, without modification. We convert it to Unicode and back in wininet. Ideally, we would avoid that altogether and not store the data in Unicode and only convert it when it's queried. Similarly, we might want to reconsider storing it as Unicode in MSHTML as well. That would be a more involved change, though, so I'm fine with a simple fix for now.
On Mon Jun 9 16:20:39 2025 +0000, Jacek Caban wrote:
It indeed broke with 216ad43f18fdf. The change looks correct, but since the string is null-terminated, you could simplify it by using `strdupAtoW`. Regarding the codepage, I suspect that what the documentation actually means is that wininet returns the string exactly as it was received from the server, without modification. We convert it to Unicode and back in wininet. Ideally, we would avoid that altogether and not store the data in Unicode and only convert it when it's queried. Similarly, we might want to reconsider storing it as Unicode in MSHTML as well. That would be a more involved change, though, so I'm fine with a simple fix for now.
Yes, but `CP_ACP` depends on system codepage, so a header that has non-ASCII characters in values wouldn't necessarily be converted properly to UTF-16, would it?
On Mon Jun 9 16:20:39 2025 +0000, Gabriel Ivăncescu wrote:
Yes, but `CP_ACP` depends on system codepage, so a header that has non-ASCII characters in values wouldn't necessarily be converted properly to UTF-16, would it?
As I said, we'd need to change wininet for that first, it currently uses `CP_ACP`.
This merge request was approved by Gabriel Ivăncescu.
On Mon Jun 9 18:09:59 2025 +0000, Jacek Caban wrote:
As I said, we'd need to change wininet for that first, it currently uses `CP_ACP`.
Yeah that's right. Well I approved this since it's mostly wininet and it's correct now.
With respect to `strdupAtoW`, he did mention at the beginning that "We were also wrongly expecting it to have a NUL terminator." @yshui did ASan catch a problem here?
On Tue Jun 10 16:01:16 2025 +0000, Gabriel Ivăncescu wrote:
Yeah that's right. Well I approved this since it's mostly wininet and it's correct now. With respect to `strdupAtoW`, he did mention at the beginning that "We were also wrongly expecting it to have a NUL terminator." @yshui did ASan catch a problem here?
As far as I can tell, the strings are null-terminated. It likely appeared otherwise because the code was treating them as wide-character strings. That part of the commit message is inaccurate and should be corrected.