[PATCH v2 0/1] MR11015: Draft: msxml3: Fallback to UTF-8 encoding if detection fails.
Fixes application that tries to parse xml file with leading white-space characters. Because native works with arbitrary number of white-space characters preceding the input and reports E_SAX_INVALIDENCODING only in case of wrong encoding declaration I have changed the default to utf-8. -- v2: msxml3: Fallback to UTF-8 encoding if detection fails. https://gitlab.winehq.org/wine/wine/-/merge_requests/11015
From: Piotr Caban <piotr@codeweavers.com> --- dlls/msxml3/saxreader.c | 6 +----- dlls/msxml3/tests/saxreader.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/dlls/msxml3/saxreader.c b/dlls/msxml3/saxreader.c index a759d39a417..366f779b063 100644 --- a/dlls/msxml3/saxreader.c +++ b/dlls/msxml3/saxreader.c @@ -5879,10 +5879,6 @@ static enum xmlencoding saxreader_match_encoding(const char *data, size_t size, return XML_ENCODING_UTF16LE; if (b[0] == 0 && b[1] == '<' && b[2] == 0 && b[3] == '?') return XML_ENCODING_UTF16BE; - if (b[0] == '<' && b[1] == '?' && b[2] == 'x' && b[3] == 'm') - return XML_ENCODING_UTF8; - if (b[0] == '<' && b[1] && b[1] != '?') - return XML_ENCODING_UTF8; if (b[0] == 0xef && b[1] == 0xbb && b[2] == 0xbf) { @@ -5902,7 +5898,7 @@ static enum xmlencoding saxreader_match_encoding(const char *data, size_t size, return XML_ENCODING_UTF16LE; } - return XML_ENCODING_UNKNOWN; + return XML_ENCODING_UTF8; } static void saxreader_detect_encoding(struct saxlocator *locator, bool force_utf16) diff --git a/dlls/msxml3/tests/saxreader.c b/dlls/msxml3/tests/saxreader.c index 8a3d87401cb..9750a3a3a4b 100644 --- a/dlls/msxml3/tests/saxreader.c +++ b/dlls/msxml3/tests/saxreader.c @@ -3417,10 +3417,15 @@ static void test_saxreader_encoding(void) static const char xml_shift_jis_test2[] = "<?xml version=\"1.0\" encoding=\"shift-jis\" ?><a>" "\x83\x89" "</a>"; + static const char utf8_ws_test[] = + " \r\n <a>text</a>"; + const struct enc_test_entry_t *entry = encoding_test_data; static const CHAR testXmlA[] = "test.xml"; DWORD ucs4_be_test[ARRAYSIZE(ucs4_le_test)]; ISAXXMLReader *reader; + LARGE_INTEGER li = { 0 }; + IStream* stream; HRESULT hr; for (int i = 0; i < ARRAYSIZE(ucs4_le_test); ++i) @@ -3464,6 +3469,20 @@ static void test_saxreader_encoding(void) hr = ISAXXMLReader_parse(reader, input); ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + /* UTF-8 with leading white-spaces */ + hr = CreateStreamOnHGlobal(NULL, TRUE, &stream); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + hr = IStream_Write(stream, utf8_ws_test, sizeof(utf8_ws_test) - 1, NULL); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + hr = IStream_Seek(stream, li, SEEK_SET, NULL); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + + V_VT(&input) = VT_UNKNOWN; + V_UNKNOWN(&input) = (IUnknown*)stream; + hr = ISAXXMLReader_parse(reader, input); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + VariantClear(&input); + ISAXXMLReader_Release(reader); free_bstrs(); -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/11015
This is working by accident because we skip spaces when entering SAX_PARSER_MISC. Once you add xml declaration it will break. Note that we skip leading spaces explicitly for DOM, it's version-dependent. So we might want to something similar in SAX. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11015#note_141537
On Thu May 28 18:18:23 2026 +0000, Nikolay Sivov wrote:
This is working by accident because we skip spaces when entering SAX_PARSER_MISC. Once you add xml declaration it will break. Note that we skip leading spaces explicitly for DOM, it's version-dependent. So we might want to something similar in SAX. It's supposed to break if xml declaration is added after white-spaces (as it also doesn't work on Windows).
-- https://gitlab.winehq.org/wine/wine/-/merge_requests/11015#note_141538
On Thu May 28 18:18:51 2026 +0000, Piotr Caban wrote:
It's supposed to break if xml declaration is added after white-spaces (as it also doesn't work on Windows). Yes, I run a wrong thing locally, sorry.
-- https://gitlab.winehq.org/wine/wine/-/merge_requests/11015#note_141556
Nikolay Sivov (@nsivov) commented about dlls/msxml3/tests/saxreader.c:
hr = ISAXXMLReader_parse(reader, input); ok(hr == S_OK, "Unexpected hr %#lx.\n", hr);
+ /* UTF-8 with leading white-spaces */ + hr = CreateStreamOnHGlobal(NULL, TRUE, &stream); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + hr = IStream_Write(stream, utf8_ws_test, sizeof(utf8_ws_test) - 1, NULL); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr); + hr = IStream_Seek(stream, li, SEEK_SET, NULL); + ok(hr == S_OK, "Unexpected hr %#lx.\n", hr);
This should use STREAM_SEEK_SET. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/11015#note_141557
participants (3)
-
Nikolay Sivov (@nsivov) -
Piotr Caban -
Piotr Caban (@piotr)