On 05/03/2011 05:30 PM, Nikolay Sivov wrote:
I'm worried about VT_UI1 being interpreted as ASCII stream or just a byte stream that could be in any encoding. If it's a case you can't assume it's encoded as utf8, when you do utf8 -> utf16 (for BSTR).
For this particular case you might want direct doparse() call for VT_UI1 array, I suggest a simple test -- create byte (VT_UI1) array over a WCHAR buffer with UTF16 xml data and try to ::load() from it. If encoding is detected then you need direct doparse() call, to do completely clear case don't include encoding= attribute in this xml data.
I glad you mentioned this, I thought about that too. I did some testing and it seems that only UTF8 (or maybe just ASCII) is supported. The SAFEARRAY does seem to be treated more like a file than a string, eg if there is a '\0' at the end of the array it causes a parse error (I don't think we need to duplicate that behavior though). I'm not sure if multi-dimensional arrays are supported, it will take some further testing, but if so that can be a separate patch; for now I'll just add a FIXME if the array is not a vector.
I remember another place where it would have been useful to be able to call domdoc_loadXML() with a UTF8 string to avoid converting to BSTR and back, I think in the schema stuff. It would be nice to have an internal function to do that, then domdoc_loadXML() can just wrap it, but I think that can be a separate patch.
Anyway if you need a interface call to implement some other method use already defined macro instead of internal static implementation function. Internal method implementations better not depend on each other.
Ye, you're right, I will add that as part of this set.