Nikolay Sivov nsivov@codeweavers.com wrote:
On 4/5/21 2:40 PM, Dmitry Timoshkov wrote:
Nikolay Sivov nsivov@codeweavers.com wrote:
Signed-off-by: Nikolay Sivov nsivov@codeweavers.com
Please don't commit this, while the patch is correct, it causes a regression: ::load()/::save() leads to a bug: the saved file has double <?xml> header, and needs another fix: doparse() should never add an xml declaration to a just loaded XML.
Thanks for catching that.
Nikolay, do you recall why that code was added in the first place?
I was a long time ago. I think the reason is that libxml2 does not use a separate node for prolog PI, and msxml does. After the loading we have to expose prolog node as a normal child.
I don't think it always adds it, the meaning of 'doc->standalone != -1' check is to add this node only if it was present in the input.
Do you have an alternative solution to removing that code from doparse()? Because doing so breaks the tests. I'd appreciate a bit of help on this.
There is no good way that I'm aware of. Node has to be there, applications expect that. If the issue is save/get_xml() output, we could reimplement node serialization similar to what libxml2, but in a way we want it.
To get rid of linking/unlinking we'll have to get rid of libxml2 as a dependency, which probably could still use libxslt. But that's really a lot of work and planning (sax, schema validation, xpath, parsing itself).
I'm not proposing to blindly remove the linking/unlinking the xml declaration code. What I would look into is revisit the doparse() code that adds the PI node with an XML declaration in case of doc->standalone = -2. According to libxml2 source doc->standalone = -2 indicates that the XML declaration *is* present however the standalone attribute is not, and in that case another (the 2nd one) XML declaration gets inserted as an PI node, however libxml2 couldn't figure that out which leads to creating double <?xml> header.