[Bug 39002] New: ISAXXMLParser ignores charset property
https://bugs.winehq.org/show_bug.cgi?id=39002 Bug ID: 39002 Summary: ISAXXMLParser ignores charset property Product: Wine Version: unspecified Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: minor Priority: P2 Component: msxml3 Assignee: wine-bugs(a)winehq.org Reporter: ott(a)mirix.org Distribution: --- ISAXXMLParser (or its implementation in Wine to be correct) tries to detect the encoding of documents with xmlDetectCharEncoding and some custom heuristics in internal_parseBuffer and uses xmlDetectCharEncoding without additional heuristics through xmlCreatePushParserCtxt in internal_parseStream which violates the specification: "This setting [charset property] takes priority over the default encoding, which is implicitly UTF-16, or over the encoding specified in the byte order mark (BOM) of the XML document header." (https://msdn.microsoft.com/en-us/library/ms757826%28v=vs.85%29.aspx) Moreover, ISAXXMLParser returns E_NOTIMPL when setting the charset property. It should at least accept ASCII, UTF-8 and UTF-16 as these encoding are detected by internal_parseBuffer and internal_parseStream anyways and are supported by libxml2. A better option would be to support all enconding supported by Wine and reencode them as UTF-8 (when possible) and supply the UTF-8 encoding data to libxml2 via custom IO callbacks. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=39002 --- Comment #1 from Nikolay Sivov <bunglehead(a)gmail.com> --- Hi, Matthias. Thanks for reporting this, do you have a test application that demonstrates this issue? -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=39002 --- Comment #2 from Nikolay Sivov <bunglehead(a)gmail.com> --- Matthias, again, do you have an application that depends on this? -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=39002 --- Comment #3 from Matthias-Christian Ott <ott(a)mirix.org> --- I have an internal application that sets this property and fails with an error because ISAXXMLParser returns E_NOTIMPL. Unfortunately, this application is an internal application and I can't share it. I only test it with Wine and the production environment is Microsoft Windows. I can provide a minimal test case if necessary. But I think you can construct an XML document which is not encoded in Unicode and the heuristics will fail. An application should be able to set the encoding based on external information (see appendix section F.2 of the XML 1.0 Recommendation). So an application would set the charset property of ISAXXML to the appropriate encoding for the document. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
participants (1)
-
wine-bugs@winehq.org