Hello developers,
the current implementation of IPersistStream::Load for XMLDOMDocument takes a copy of the given stream into a hglobal stream, and uses a helper function to parse that, namely parse_xml that calls xmlReadMemory if available (since 2.6.0) and uses the old xmlParseMemory call otherwise.
If libxml2 is new enough, it is possible to parse directly from the stream using xmlReadIO to avoid copying the whole stream contents to memory. The function xmlReadIO was introduced in libxml 2.6.0, five years ago. If I want to use that function, do I have to create an alternative code path for such ancient libxml implemementations, or is it OK to require libxml 2.6.0 as minimum for libxml (even Debian Sarge, i.e. oldstable has it)?
Another question arose on looking at the current implementation. If Wine is not going the direct-stream-to-libxml2-way, but wants to take a copy, currently the copying is performed manually, although there is a function CopyTo in IStream that takes care of creating the copy.
I attached two diff files, - load-via-readio.diff shows the way that calls stream callbacks from libxml2 - load-use-copyto.diff uses the CopyTo function, but still copies.
Thanks in advance for any feedback, Michael Karcher
Am Freitag, den 17.10.2008, 00:46 +0200 schrieb Michael Karcher:
If libxml2 is new enough, it is possible to parse directly from the stream using xmlReadIO to avoid copying the whole stream contents to memory.
Testing native, it looks like this is the right approach. Piotr Caban asked me whether it works with asynchronous streams. MSDN confusingly describes three types of streams, two of them being equal to the user: - Synchronous streams: Have low latency read calls, no network access needed after creation. Examples are Files and Memory blocks. No short reads (read returning less characters than requested). - Blocking asynchronous streams: May have high latency read calls, but read never fails with E_PENDING. The contents of the stream may not be locally available before Read is called on the respective part of data. Examples are downloading streams (either a direct http download or a downloaded compound document as asynchronous storage). - Non-blocking asynchronous streams: Have low latency read calls. If no data is available, read fails with E_PENDING. They are useless on there own, as the pull model doesn't work without polling for data and they don't provide a push model (in case of reading, opposite in case of writing). They are used in asynchronous binding where a IBindStatusCallback of the consumer provides the needed push model. [see http://msdn.microsoft.com/en-us/library/aa768185(VS.85).aspx, "Storage of Control persistent Data"]
The consumer of a stream typically doesn't care whether the stream is synchronous or blocking asynchronous. The IPersistStream interface of DOMDocument behaves funny when receiving a non-blocking asynchronous stream: It reads until two consecutive E_PENDING results, then returns with S_OK, and readyState being INTERACTIVE, which is probably the correct behaviour for non-blocking asynchronous loads, but the native implementation does not seem to try to obtain any push capability from the stream, so there seems to be no possibility to push further data into the object.
I suspect that this is a kind of "don't do this" undefined behaviour.
Another issue that comes up is that with the asynchronous loading via IPersistMoniker, while the document is being loaded, native shows a tree progressively getting filled in its DOM. It should be read-only (didn't test that yet), and provides call-backs for the event of receiving further data (didn't test them yet, too). libxml2 does not seem to provide an incremental tree builder (even with the push interface), so it looks like we need our own tree-building code based on libxml2's SAX interface. Any other ideas?
I attached two diff files,
- load-via-readio.diff shows the way that calls stream callbacks from
libxml2
Looks like the way to go.
- load-use-copyto.diff uses the CopyTo function, but still copies.
Bad idea. Some applications might put in a dumb stream that does not implement CopyTo.
Regards, Michael Karcher