Background: I am trying to improve the performance/reliability of GuildWars2 under wine. In particular, PNG handling seems very slow, so I am looking at windowscodecs and I seem to be misunderstanding some things. I have read the PNG spec.
First: The CRC check. The spec says that a 'chunk' contains a four byte size of data field, a four byte 'chunk identifier', the data and a CRC32 of the data.
I found some code that reads a generic chunk. I see where the size and type are read, where a correctly sized record is allocated and where the data is read into the record.
There is a 'FIXME' that asks about checking the CRC, but no code to actually read or check the CRC.
I added code to read and check the CRC to my working copy of the git tree. That threw off the block synchronization, so I think there is code someplace else that either checks or skips the CRC, but I have not found it after a few hours searching for it. An indication on where to look and other advice would be very helpful at this time.
Second: At least two of the methods GuildWars2 wants to use are stubbed. To implement those methods, the frame section of the WIC object should contain an array (or something with similar properties) of pointers to chunks. From what I have read about WIC, other formats could use a similar array. Some could even use an array of pointers to frames. Before I go messing with that level of change, I think I should listen to other peoples opinions of the subject.
Third: On a very broad level, the whole OLE implementation seems to be very messed up. In particular, the usual practice for doing inheritance in C does not seem to be being used. That practice is to have the elements common to the base and extending object be placed at the beginning of the implementing data structures. While the member names need not be the same in all instances, the function, type and order of the common elements must be the same for economical implementation.
This has been done for the basic object; all start with a pointer to the 'vtable'. What I think should follow is the critical section lock structure and, for 'IUnknown', the reference count. What I see is that these common elements are placed more or less at random in each extended object.
Is there any reason, other than inertia, that at least these two fields should not be moved to the beginning of implementing objects. This can be done one object at a time and would allow the changed objects to use a common implementation of 'AddRef' at least, and another common routine or macro that handles the critical part of 'Release'.
That code is unused for reading and writing PNG images right now, sorry for the confusion. Libpng has its own code for reading and writing PNG chunks, and that is what is being used. Our chunk code is used for PNG metadata handlers (which will have to work independently from libpng so we can have a proper, pluggable system for metadata), but the architecture isn't there yet to use those handlers when reading a PNG image.
None of that code should be used yet in a file with more than one block, so I don't know how this could be causing problems. My plan was that it would be up to the caller to seek to the next block based on the returned data size.
Please keep in mind that Guild Wars 2 is probably not using WIC directly. You should find out which library it's actually using (probably d3dx9, but could also be gdiplus or oleaut32's IPicture interface) and take that into account. We can modify users of WIC to use it in efficient ways, and/or optimize WIC for the way we are using it. (MSDN recommends reading the entire image once either in one CopyPixels call or in rows from top to bottom, and optimizing for that case. We don't do that yet because we needed the general case first.)
I don't think a base class would've make anything easier.
We still couldn't do that because every interface vtable pointer would be at a different offset within the struct. We couldn't share them between encoders because different encoders have different data that they need to free. Anyway, it's not likely to have a measurable impact on performance.
Could you share some results of your profiling? What is it that led you to believe the PNG code is a bottleneck?
On 10/24/2012 02:18 PM, Vincent Povirk wrote:
Hmm. With the version that reads the CRC, I get more error messages and failures doing a 'make test' than the version without the addition.
d3dx9. From comments from the developers, they were having trouble with slow access to their data file. It's a huge thing, 30+GB. (Yes, that is a G, not an M). I'd tried a patch on GetCount (which completely misunderstood) and it went on to do something 'ReaderByIndex'. It put more images on the screen much faster for some reason. I have very very little information on the game's internal structure other than what wine logs.
OK. I was thinking every object based on IUnknown would start with a VTable pointer, a critical section lock and a reference count. I did not expect to see any performance improvement, just the absence of innumerable 'AddRef' and 'Release' variants. In fact I sort of expected both of them to go away as virtual functions with a virtual destructor in their place.
Could you share some results of your profiling? What is it that led you to believe the PNG code is a bottleneck?
Not allowed to profile as such. It was just the difference in speed when the game thought it had 'PNGGetCount'