Hi everyone, D3DXLoadSurfaceFromMemory is used to load a chunk of pixel data into a surface, performing any pixel format conversion from signed/unsigned ARGB, floating point or FourCC formats to any other format and also performs various types of filtering actions (one of copying, point filtering, linear filtering, triangular filtering and bov filtering and additionally dithering). I've partially implemented this function during my GSoC project, supporting only unsigned ARGB formats (with 1-4 bytes) and copying for now (supporting signed formats and point filtering is trivial to add to this version), but due to the huge functionality of the function it's a hard task to get the code right without either being somewhat slower than the native dll or having thousands of lines of code for each conversion/filtering scenario. My current approach is to implement various specific converters (which actually do converting AND filtering): - converting from and to "small" signed/unsigned ARGB formats - converting "large" ARGB formats to small ones - converting "large" ARGB formats to other large ARGB formats - converting DXT formats to other DXT formats - converting DXT formats to small ARGB formats - .... - additionally, I plan to a function for each filter which filters data when the source and destination formats are the same (which gives a huge performance boost which is e.g. necessary for mipmap generation) As you can see, that's still a great bunch of code, but we don't necessarily need to implement all of these (we could just implement the to/from small ARGB part for each format type and use a buffer if we need to convert from one "unusual" format to another unusual one). These converters will need to be duplicated across the various filter types. This architecture is the only way I could think of managing the implementation of this function without macros (although they would really improve the code in this scenario) and still keeping up to speed with the native dll.
I have attached a patch providing my current implementation, providing support for copying data from small unsigned ARGB data to small unsigned ARGB data. A few notes on these few lines of code: 1. if(DestFormat.rbits) { 2. if(SrcFormat.rbits) { 3. for(shift = DestShiftR; shift > DestFormat.rshift; shift -= SrcFormat.rbits) *pPixel |= r << shift; 4. *pPixel |= (r >> (DestFormat.rshift - shift)) << DestFormat.rshift; 5. } else *pPixel |= DestFormat.rmask; 6. }
1: There's no need to mess around with the red component if we don't need it in the destination color anyways 2/5: If the source color does not provide a red component, we assume the maximum red value (which is the red mask) 3/4: These lines convert the source color component to the destination color component; the for loop is needed to make sure that e.g. an R5 0b11111 maps to an 0b11111111 in R8 instead of 0b11111000 (the for loop is faster than using floating point calculation for this) The actual D3DXLoadSurfaceFromMemory implementation doesn't do anything special, the "critical" stuff happens in copy_simple_data.
I hope the code is understandable in the other parts, my few main questions are: Do you agree that the architecture is the best approach or can you think of a better one? would using macros be legal here since it could really help reduce code duplication when implementing the other filters? Is the code itself okay? Suggestions about coding style or other things are welcome. Is there any optimization I missed? - I spend a somewhat large amount of time getting the right balance between wide format support and good performance, but maybe things can be improved a bit more. Right now my implementation is about 10 % slower than the native one.
Any other comments which could be interesting to me at this point are also appreciated, of course.
Best regards, Tony