[PATCH v3 0/1] MR11015: msxml3: Fallback to UTF-8 encoding if detection fails.
Fixes application that tries to parse xml file with leading white-space characters. Because native works with arbitrary number of white-space characters preceding the input and reports E_SAX_INVALIDENCODING only in case of wrong encoding declaration I have changed the default to utf-8. -- v3: msxml3: Fallback to UTF-8 encoding if detection fails. https://gitlab.winehq.org/wine/wine/-/merge_requests/11015
From: Piotr Caban <piotr@codeweavers.com> --- dlls/msxml3/saxreader.c | 6 +----- dlls/msxml3/tests/saxreader.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/dlls/msxml3/saxreader.c b/dlls/msxml3/saxreader.c index a759d39a417..366f779b063 100644 --- a/dlls/msxml3/saxreader.c +++ b/dlls/msxml3/saxreader.c @@ -5879,10 +5879,6 @@ static enum xmlencoding saxreader_match_encoding(const char *data, size_t size, return XML_ENCODING_UTF16LE; if (b[0] == 0 && b[1] == '<' && b[2] == 0 && b[3] == '?') return XML_ENCODING_UTF16BE; - if (b[0] == '<' && b[1] == '?' && b[2] == 'x' && b[3] == 'm') - return XML_ENCODING_UTF8; - if (b[0] == '<' && b[1] && b[1] != '?') - return XML_ENCODING_UTF8; if (b[0] == 0xef && b[1] == 0xbb && b[2] == 0xbf) { @@ -5902,7 +5898,7 @@ static enum xmlencoding saxreader_match_encoding(const char *data, size_t size, return XML_ENCODING_UTF16LE; } - return XML_ENCODING_UNKNOWN; + return XML_ENCODING_UTF8; } static void saxreader_detect_encoding(struct saxlocator *locator, bool force_utf16) diff --git a/dlls/msxml3/tests/saxreader.c b/dlls/msxml3/tests/saxreader.c index 8a3d87401cb..564ec69eb28 100644 --- a/dlls/msxml3/tests/saxreader.c +++ b/dlls/msxml3/tests/saxreader.c @@ -3417,10 +3417,15 @@ static void test_saxreader_encoding(void) static const char xml_shift_jis_test2[] = "<?xml version=\"1.0\" encoding=\"shift-jis\" ?><a>" "\x83\x89" "</a>"; + static const char utf8_ws_test[] = + " \r\n <a>text</a>"; + const struct enc_test_entry_t *entry = encoding_test_data; static const CHAR testXmlA[] = "test.xml"; DWORD ucs4_be_test[ARRAYSIZE(ucs4_le_test)]; ISAXXMLReader *reader; + LARGE_INTEGER li = { 0 }; + IStream* stream; HRESULT hr; for (int i = 0; i < ARRAYSIZE(ucs4_le_test); ++i) @@ -3464,6 +3469,20 @@ static void test_saxreader_encoding(void) hr = ISAXXMLReader_parse(reader, input); ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + /* UTF-8 with leading white-spaces */ + hr = CreateStreamOnHGlobal(NULL, TRUE, &stream); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + hr = IStream_Write(stream, utf8_ws_test, sizeof(utf8_ws_test) - 1, NULL); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + hr = IStream_Seek(stream, li, STREAM_SEEK_SET, NULL); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + + V_VT(&input) = VT_UNKNOWN; + V_UNKNOWN(&input) = (IUnknown*)stream; + hr = ISAXXMLReader_parse(reader, input); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + VariantClear(&input); + ISAXXMLReader_Release(reader); free_bstrs(); -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/11015
On Thu May 28 20:31:20 2026 +0000, Nikolay Sivov wrote:
This should use STREAM_SEEK_SET. Thanks, fixed in new version.
-- https://gitlab.winehq.org/wine/wine/-/merge_requests/11015#note_141559
This merge request was approved by Nikolay Sivov. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11015
On Thu May 28 20:23:50 2026 +0000, Nikolay Sivov wrote:
Yes, I run a wrong thing locally, sorry. I also tried WCHAR content with leading spaces, and that also doesn't work on Windows.
-- https://gitlab.winehq.org/wine/wine/-/merge_requests/11015#note_141561
On Thu May 28 20:32:27 2026 +0000, Nikolay Sivov wrote:
I also tried WCHAR content with leading spaces, and that also doesn't work on Windows. FWIW I've seen more differences in encoding detection while looking into it:
* on Windows parse and parseURL behaves differently in some cases (e.g. parseURL doesn't allow UTF-8 xml files with leading white spaces) * it's only possible to detect UTF-16 by specyfing BOM when stream is passed on Windows (while we're detecting L"\<..." prefix) Above observations may be version specific. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11015#note_141566
participants (3)
-
Nikolay Sivov (@nsivov) -
Piotr Caban -
Piotr Caban (@piotr)