This is the only place where xmlSaveToIO() is forced to use UTF-8 for an output document, other places specify NULL for the default encoding.
This doesn't completely fix the saved XML contents, but at least XML document has proper encoding now.
Signed-off-by: Dmitry Timoshkov dmitry@baikal.ru --- dlls/msxml3/domdoc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dlls/msxml3/domdoc.c b/dlls/msxml3/domdoc.c index a81ef5f16cb..49596999d16 100644 --- a/dlls/msxml3/domdoc.c +++ b/dlls/msxml3/domdoc.c @@ -1405,7 +1405,7 @@ static HRESULT WINAPI domdoc_get_xml( return E_OUTOFMEMORY;
options = XML_SAVE_FORMAT | XML_SAVE_NO_DECL; - ctxt = xmlSaveToIO(domdoc_get_xml_writecallback, NULL, buf, "UTF-8", options); + ctxt = xmlSaveToIO(domdoc_get_xml_writecallback, NULL, buf, NULL, options);
if(!ctxt) {
On 3/19/21 6:25 PM, Dmitry Timoshkov wrote:
This is the only place where xmlSaveToIO() is forced to use UTF-8 for an output document, other places specify NULL for the default encoding.
It's because get_xml() and save() are different. UTF-8 is used together with bstr_from_xmlChar().
This doesn't completely fix the saved XML contents, but at least XML document has proper encoding now.
What is the proper encoding if output is always in WCHARs?
Signed-off-by: Dmitry Timoshkov dmitry@baikal.ru
dlls/msxml3/domdoc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dlls/msxml3/domdoc.c b/dlls/msxml3/domdoc.c index a81ef5f16cb..49596999d16 100644 --- a/dlls/msxml3/domdoc.c +++ b/dlls/msxml3/domdoc.c @@ -1405,7 +1405,7 @@ static HRESULT WINAPI domdoc_get_xml( return E_OUTOFMEMORY;
options = XML_SAVE_FORMAT | XML_SAVE_NO_DECL;
- ctxt = xmlSaveToIO(domdoc_get_xml_writecallback, NULL, buf, "UTF-8", options);
ctxt = xmlSaveToIO(domdoc_get_xml_writecallback, NULL, buf, NULL, options);
if(!ctxt) {
Correct way to fix formatting and encoding issues is to reimplement node dumping functionality in msxml itself.
Nikolay Sivov nsivov@codeweavers.com wrote:
On 3/19/21 6:25 PM, Dmitry Timoshkov wrote:
This is the only place where xmlSaveToIO() is forced to use UTF-8 for an output document, other places specify NULL for the default encoding.
It's because get_xml() and save() are different. UTF-8 is used together with bstr_from_xmlChar().
Since that change doesn't break current tests I'd guess that either current behaviour is based on some guess work. Could you please add the tests to show the difference, and that won't work with my patch?
This doesn't completely fix the saved XML contents, but at least XML document has proper encoding now.
What is the proper encoding if output is always in WCHARs?
I have an application that expects to get such an XML in the encoding specified in the ProcessingInstruction, like the test in 1/3 does. In fact, the application asks for encoding that matches current ANSI codepage, and doesn't expect to get UTF-8 vs cp1251 which are completely different. As you can probably see with current code my application is utterly broken.
diff --git a/dlls/msxml3/domdoc.c b/dlls/msxml3/domdoc.c index a81ef5f16cb..49596999d16 100644 --- a/dlls/msxml3/domdoc.c +++ b/dlls/msxml3/domdoc.c @@ -1405,7 +1405,7 @@ static HRESULT WINAPI domdoc_get_xml( return E_OUTOFMEMORY;
options = XML_SAVE_FORMAT | XML_SAVE_NO_DECL;
- ctxt = xmlSaveToIO(domdoc_get_xml_writecallback, NULL, buf, "UTF-8", options);
ctxt = xmlSaveToIO(domdoc_get_xml_writecallback, NULL, buf, NULL, options);
if(!ctxt) {
Correct way to fix formatting and encoding issues is to reimplement node dumping functionality in msxml itself.
I guess that's a large undertaking, are you planning to work on this? If not, then I think that the proposed fix might be of a compromise solution.
On 3/22/21 4:23 PM, Dmitry Timoshkov wrote:
Nikolay Sivov nsivov@codeweavers.com wrote:
On 3/19/21 6:25 PM, Dmitry Timoshkov wrote:
This is the only place where xmlSaveToIO() is forced to use UTF-8 for an output document, other places specify NULL for the default encoding.
It's because get_xml() and save() are different. UTF-8 is used together with bstr_from_xmlChar().
Since that change doesn't break current tests I'd guess that either current behaviour is based on some guess work. Could you please add the tests to show the difference, and that won't work with my patch?
The difference is that save() respects encoding, and get_xml() always returns UTF-16, with no encoding attribute. Your patch does not fix that.
This doesn't completely fix the saved XML contents, but at least XML document has proper encoding now.
What is the proper encoding if output is always in WCHARs?
I have an application that expects to get such an XML in the encoding specified in the ProcessingInstruction, like the test in 1/3 does. In fact, the application asks for encoding that matches current ANSI codepage, and doesn't expect to get UTF-8 vs cp1251 which are completely different. As you can probably see with current code my application is utterly broken.
What is broken? What does it expect in returned BSTR? There are no test changes associated with 3/3, so why have it.
Correct output won't depend on specified encoding, once document is loaded.
diff --git a/dlls/msxml3/domdoc.c b/dlls/msxml3/domdoc.c index a81ef5f16cb..49596999d16 100644 --- a/dlls/msxml3/domdoc.c +++ b/dlls/msxml3/domdoc.c @@ -1405,7 +1405,7 @@ static HRESULT WINAPI domdoc_get_xml( return E_OUTOFMEMORY;
options = XML_SAVE_FORMAT | XML_SAVE_NO_DECL;
- ctxt = xmlSaveToIO(domdoc_get_xml_writecallback, NULL, buf, "UTF-8", options);
ctxt = xmlSaveToIO(domdoc_get_xml_writecallback, NULL, buf, NULL, options);
if(!ctxt) {
Correct way to fix formatting and encoding issues is to reimplement node dumping functionality in msxml itself.
I guess that's a large undertaking, are you planning to work on this? If not, then I think that the proposed fix might be of a compromise solution.
Nikolay Sivov nsivov@codeweavers.com wrote:
On 3/19/21 6:25 PM, Dmitry Timoshkov wrote:
This is the only place where xmlSaveToIO() is forced to use UTF-8 for an output document, other places specify NULL for the default encoding.
It's because get_xml() and save() are different. UTF-8 is used together with bstr_from_xmlChar().
Since that change doesn't break current tests I'd guess that either current behaviour is based on some guess work. Could you please add the tests to show the difference, and that won't work with my patch?
The difference is that save() respects encoding, and get_xml() always returns UTF-16, with no encoding attribute. Your patch does not fix that.
The patch doesn't claim to fix that case.
This doesn't completely fix the saved XML contents, but at least XML document has proper encoding now.
What is the proper encoding if output is always in WCHARs?
I have an application that expects to get such an XML in the encoding specified in the ProcessingInstruction, like the test in 1/3 does. In fact, the application asks for encoding that matches current ANSI codepage, and doesn't expect to get UTF-8 vs cp1251 which are completely different. As you can probably see with current code my application is utterly broken.
What is broken? What does it expect in returned BSTR? There are no test changes associated with 3/3, so why have it.
Correct output won't depend on specified encoding, once document is loaded.
Without 3/3 I get an XML with encoding="UTF-8" in the declaration and cp1251 encoded contents.
I'm OK with deferring 3/3 for now, I'll have a look at what is going on once first two patches in the series get accepted.