On 15.04.2013 02:10, Nozomi Kodama wrote:
Hello
thanks for the review. I don't think that calling defines is the way to go. Indeed, I tested my patch and yours. Yours is about 12% slower than mine in my computer. And now, we try to take care of efficiency of this dll. So, it is not the time to increase latency.
You are right. I'm not really sure why the code should be slower though. The #defines shouldn't have an impact on the performance, but it might be because it is translated to:
ta = 0.28209479f * a[0] + -0.12615662f * a[6] + -0.21850968f * a[8]; tb = 0.28209479f * b[0] + -0.12615662f * b[6] + -0.21850968f * b[8]; out[1] = 0.0f + ta * b[1] + tb * a[1]; t = a[1] * b[1]; out[0] = out[0] + 0.28209479f * t; out[6] = 0.0f + -0.12615662f * t; out[8] = 0.0f + -0.21850968f * t;
instead of: ta = 0.28209479f * a[0] - 0.12615662f * a[6] - 0.21850968f * a[8]; tb = 0.28209479f * b[0] - 0.12615662f * b[6] - 0.21850968f * b[8]; out[1] = ta * b[1] + tb * a[1]; t = a[1] * b[1]; out[0] += 0.28209479f * t; out[6] = -0.12615662f * t; out[8] = -0.21850968f * t;
May be due to "out[8] = -0.21850968f * t;" vs "out[8] = 0.0f + -0.21850968f * t;": 1. the extra 0.0f - Shouldn't the 0.0f get optimized away? Imho the macro could be fixed (e.g. use an if / separate macro / don't use a macro for this cases). 2. "+ -0.21850968f * t;" should be as fast as "-0.21850968f * t" isn't it?
Does anyone else have an objection about this? I just feel that the code size and the always reused constants could be written a little bit nicer. I'm not sure if the macro usage is really better, it was just a quick idea to avoid that much code duplication. Also changes like the suggested below are really error prone, because they change a lot of code mostly using copy and paste. Though I vote for the same style and precision usage in all cases.
Hope this helps a bit Rico
I used 10 digits since there are a lot of computation, I want to avoid as much as possible big rounding errors. If we want to uniformize, then we should uniformize d3dxshmultiply 2,3,4 with 10 digits. But that is for an another patch.
Nozomi.