On 5/3/2011 08:07, Adam Martinson wrote:
Fixes bug 14864.
dlls/msxml3/domdoc.c | 255 ++++++++++++++++++++++++++------------------------ 1 files changed, 134 insertions(+), 121 deletions(-)
- case VT_ARRAY|VT_UI1:
{
SAFEARRAY* psa;
BSTR bstr;
psa = V_ARRAY(&source);
bstr = bstr_from_xmlChar(psa->pvData);
hr = domdoc_loadXML(iface, bstr, isSuccessful);
SysFreeString(bstr);
return hr;
}
break;
Is it really supposed to be utf8 array? Or this will work cause one-byte chars match utf8?
Also I don't like this method dependency, I think the proper way is to call IXMLDOMDocument3_loadXML(iface...) here. Or another way to avoid call through vtable - introduce something called domdoc_load_xml(This...) and use it in both places, but I think it's not worth it.
On 05/03/2011 01:45 AM, Nikolay Sivov wrote:
On 5/3/2011 08:07, Adam Martinson wrote:
Fixes bug 14864.
dlls/msxml3/domdoc.c | 255 ++++++++++++++++++++++++++------------------------ 1 files changed, 134 insertions(+), 121 deletions(-)
- case VT_ARRAY|VT_UI1:
{
SAFEARRAY* psa;
BSTR bstr;
psa = V_ARRAY(&source);
bstr = bstr_from_xmlChar(psa->pvData);
hr = domdoc_loadXML(iface, bstr, isSuccessful);
SysFreeString(bstr);
return hr;
}
break;
Is it really supposed to be utf8 array? Or this will work cause one-byte chars match utf8?
Also I don't like this method dependency, I think the proper way is to call IXMLDOMDocument3_loadXML(iface...) here. Or another way to avoid call through vtable - introduce something called domdoc_load_xml(This...) and use it in both places, but I think it's not worth it.
It was the Google Chrome installer, I'm not sure if it's supposed to be UTF8 or not, all of the characters it used were ASCII. I'm assuming it's supposed to be UTF8 due to it being VT_UI8 instead of VT_I8, but I can't say for sure. I remember another place where it would have been useful to be able to call domdoc_loadXML() with a UTF8 string to avoid converting to BSTR and back, I think in the schema stuff. It would be nice to have an internal function to do that, then domdoc_loadXML() can just wrap it, but I think that can be a separate patch.
On 05/03/2011 09:43 AM, Adam Martinson wrote:
On 05/03/2011 01:45 AM, Nikolay Sivov wrote:
On 5/3/2011 08:07, Adam Martinson wrote:
Fixes bug 14864.
dlls/msxml3/domdoc.c | 255 ++++++++++++++++++++++++++------------------------ 1 files changed, 134 insertions(+), 121 deletions(-)
- case VT_ARRAY|VT_UI1:
{
SAFEARRAY* psa;
BSTR bstr;
psa = V_ARRAY(&source);
bstr = bstr_from_xmlChar(psa->pvData);
hr = domdoc_loadXML(iface, bstr, isSuccessful);
SysFreeString(bstr);
return hr;
}
break;
Is it really supposed to be utf8 array? Or this will work cause one-byte chars match utf8?
Also I don't like this method dependency, I think the proper way is to call IXMLDOMDocument3_loadXML(iface...) here. Or another way to avoid call through vtable - introduce something called domdoc_load_xml(This...) and use it in both places, but I think it's not worth it.
It was the Google Chrome installer, I'm not sure if it's supposed to be UTF8 or not, all of the characters it used were ASCII. I'm assuming it's supposed to be UTF8 due to it being VT_UI8 instead of VT_I8, but I can't say for sure. I remember another place where it would have been useful to be able to call domdoc_loadXML() with a UTF8 string to avoid converting to BSTR and back, I think in the schema stuff. It would be nice to have an internal function to do that, then domdoc_loadXML() can just wrap it, but I think that can be a separate patch.
Err, VT_I1/VT_UI1, not VT_I8/VT_UI8.
On 5/3/2011 18:43, Adam Martinson wrote:
On 05/03/2011 01:45 AM, Nikolay Sivov wrote:
On 5/3/2011 08:07, Adam Martinson wrote:
Fixes bug 14864.
dlls/msxml3/domdoc.c | 255 ++++++++++++++++++++++++++------------------------ 1 files changed, 134 insertions(+), 121 deletions(-)
- case VT_ARRAY|VT_UI1:
{
SAFEARRAY* psa;
BSTR bstr;
psa = V_ARRAY(&source);
bstr = bstr_from_xmlChar(psa->pvData);
hr = domdoc_loadXML(iface, bstr, isSuccessful);
SysFreeString(bstr);
return hr;
}
break;
Is it really supposed to be utf8 array? Or this will work cause one-byte chars match utf8?
Also I don't like this method dependency, I think the proper way is to call IXMLDOMDocument3_loadXML(iface...) here. Or another way to avoid call through vtable - introduce something called domdoc_load_xml(This...) and use it in both places, but I think it's not worth it.
It was the Google Chrome installer, I'm not sure if it's supposed to be UTF8 or not, all of the characters it used were ASCII. I'm assuming it's supposed to be UTF8 due to it being VT_UI8 instead of VT_I8, but I can't say for sure.
I'm worried about VT_UI1 being interpreted as ASCII stream or just a byte stream that could be in any encoding. If it's a case you can't assume it's encoded as utf8, when you do utf8 -> utf16 (for BSTR).
For this particular case you might want direct doparse() call for VT_UI1 array, I suggest a simple test -- create byte (VT_UI1) array over a WCHAR buffer with UTF16 xml data and try to ::load() from it. If encoding is detected then you need direct doparse() call, to do completely clear case don't include encoding= attribute in this xml data.
I remember another place where it would have been useful to be able to call domdoc_loadXML() with a UTF8 string to avoid converting to BSTR and back, I think in the schema stuff. It would be nice to have an internal function to do that, then domdoc_loadXML() can just wrap it, but I think that can be a separate patch.
Anyway if you need a interface call to implement some other method use already defined macro instead of internal static implementation function. Internal method implementations better not depend on each other.
On 05/03/2011 05:30 PM, Nikolay Sivov wrote:
I'm worried about VT_UI1 being interpreted as ASCII stream or just a byte stream that could be in any encoding. If it's a case you can't assume it's encoded as utf8, when you do utf8 -> utf16 (for BSTR).
For this particular case you might want direct doparse() call for VT_UI1 array, I suggest a simple test -- create byte (VT_UI1) array over a WCHAR buffer with UTF16 xml data and try to ::load() from it. If encoding is detected then you need direct doparse() call, to do completely clear case don't include encoding= attribute in this xml data.
I glad you mentioned this, I thought about that too. I did some testing and it seems that only UTF8 (or maybe just ASCII) is supported. The SAFEARRAY does seem to be treated more like a file than a string, eg if there is a '\0' at the end of the array it causes a parse error (I don't think we need to duplicate that behavior though). I'm not sure if multi-dimensional arrays are supported, it will take some further testing, but if so that can be a separate patch; for now I'll just add a FIXME if the array is not a vector.
I remember another place where it would have been useful to be able to call domdoc_loadXML() with a UTF8 string to avoid converting to BSTR and back, I think in the schema stuff. It would be nice to have an internal function to do that, then domdoc_loadXML() can just wrap it, but I think that can be a separate patch.
Anyway if you need a interface call to implement some other method use already defined macro instead of internal static implementation function. Internal method implementations better not depend on each other.
Ye, you're right, I will add that as part of this set.
On 5/4/2011 19:30, Adam Martinson wrote:
On 05/03/2011 05:30 PM, Nikolay Sivov wrote:
I'm worried about VT_UI1 being interpreted as ASCII stream or just a byte stream that could be in any encoding. If it's a case you can't assume it's encoded as utf8, when you do utf8 -> utf16 (for BSTR).
For this particular case you might want direct doparse() call for VT_UI1 array, I suggest a simple test -- create byte (VT_UI1) array over a WCHAR buffer with UTF16 xml data and try to ::load() from it. If encoding is detected then you need direct doparse() call, to do completely clear case don't include encoding= attribute in this xml data.
I glad you mentioned this, I thought about that too. I did some testing and it seems that only UTF8 (or maybe just ASCII) is supported. The SAFEARRAY does seem to be treated more like a file than a string, eg if there is a '\0' at the end of the array it causes a parse error (I don't think we need to duplicate that behavior though).
Yeah, I don't see a reason to implement this particular case for now.
I'm not sure if multi-dimensional arrays are supported, it will take some further testing, but if so that can be a separate patch; for now I'll just add a FIXME if the array is not a vector.
I don't expect it's supported, it sounds a bit insane for me to load from multidimensional arrays. I think after this case (VT_UI1 array) implemented we need a WARN for any other variant type. And some tests for sure.