https://bugs.winehq.org/show_bug.cgi?id=39002
Bug ID: 39002 Summary: ISAXXMLParser ignores charset property Product: Wine Version: unspecified Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: minor Priority: P2 Component: msxml3 Assignee: wine-bugs@winehq.org Reporter: ott@mirix.org Distribution: ---
ISAXXMLParser (or its implementation in Wine to be correct) tries to detect the encoding of documents with xmlDetectCharEncoding and some custom heuristics in internal_parseBuffer and uses xmlDetectCharEncoding without additional heuristics through xmlCreatePushParserCtxt in internal_parseStream which violates the specification:
"This setting [charset property] takes priority over the default encoding, which is implicitly UTF-16, or over the encoding specified in the byte order mark (BOM) of the XML document header." (https://msdn.microsoft.com/en-us/library/ms757826%28v=vs.85%29.aspx)
Moreover, ISAXXMLParser returns E_NOTIMPL when setting the charset property. It should at least accept ASCII, UTF-8 and UTF-16 as these encoding are detected by internal_parseBuffer and internal_parseStream anyways and are supported by libxml2. A better option would be to support all enconding supported by Wine and reencode them as UTF-8 (when possible) and supply the UTF-8 encoding data to libxml2 via custom IO callbacks.