Nikolay Sivov nsivov@codeweavers.com wrote:
On 3/19/21 6:25 PM, Dmitry Timoshkov wrote:
This is the only place where xmlSaveToIO() is forced to use UTF-8 for an output document, other places specify NULL for the default encoding.
It's because get_xml() and save() are different. UTF-8 is used together with bstr_from_xmlChar().
Since that change doesn't break current tests I'd guess that either current behaviour is based on some guess work. Could you please add the tests to show the difference, and that won't work with my patch?
The difference is that save() respects encoding, and get_xml() always returns UTF-16, with no encoding attribute. Your patch does not fix that.
The patch doesn't claim to fix that case.
This doesn't completely fix the saved XML contents, but at least XML document has proper encoding now.
What is the proper encoding if output is always in WCHARs?
I have an application that expects to get such an XML in the encoding specified in the ProcessingInstruction, like the test in 1/3 does. In fact, the application asks for encoding that matches current ANSI codepage, and doesn't expect to get UTF-8 vs cp1251 which are completely different. As you can probably see with current code my application is utterly broken.
What is broken? What does it expect in returned BSTR? There are no test changes associated with 3/3, so why have it.
Correct output won't depend on specified encoding, once document is loaded.
Without 3/3 I get an XML with encoding="UTF-8" in the declaration and cp1251 encoded contents.
I'm OK with deferring 3/3 for now, I'll have a look at what is going on once first two patches in the series get accepted.