Re: d3dx9: Avoid expensive computations

27 Feb 2013

      2013/2/26 Rico Schüller kgbricola@web.de:
...
Hi Nozomi,
this is pretty fast. Just some numbers (run time on my machine, so it might
not be that representative)...
before: 43s
previous patch: 27s
this patch: 21s
native: 16s
So from the speed point of view, it's a lot closer than the rest.
Though, I would split this into 2 patches, one for D3DXMatrixDeterminant and
one for D3DXMatrixInverse.
That's probably a good idea.
...
I think it's a nice step forward. Thought we
might test the speed of an sse version and may use it later ...
Are there any other opinions?
My main concern is that the effort in optimizing further those two
functions might not have significant effects on actual application
execution times (think diminishing returns). I'm not against making
the code faster, especially if that doesn't make the code unreadable,
but it might not be the best place to work on if you want to optimize
d3dx9. You might want to profile some applications and see what the
actual bottlenecks are.
Specifically on these functions, an SSE-based version will probably
run significantly faster, but you need to solve the issues with
compatibility with older CPUs e.g. by selecting the correct function
implementation at runtime in some fashion, as Henri mentioned. BTW
there might be other potential problems, such as applications setting
the SSE control register in some unexpected way (although that happens
with the FPU control word too).
You can also give a shot to GCC optimization options, such as
"-mfpmath=sse" (and a suitable -march value). Obviously we don't want
to use them in general but it might be interesting to see what GCC can
do there. Keep in mind that the compiler has to stay on the safe side
when optimizing and you might need to add attributes around to allow
more aggressive optimizations. From a quick Google search I found
http://locklessinc.com/articles/vectorize/ which seems to show the
general idea.
Cheers,
Matteo.
...
Cheers
Rico
On 25.02.2013 12:34, Nozomi Kodama wrote:
...
Rico,
can you give a try to this patch?
If it is slightly slower than native, we could at first merge it.
Anyway, if the application is well coded, this function should not be
called often. Usually an application uses transformations matrices that
are a lot easier to inverse
Nozomi

*De :* Henri Verbeet hverbeet@gmail.com
*À :* Rico Schüller kgbricola@web.de
*Cc :* wine-devel@winehq.org; Nozomi Kodama nozomi.kodama@yahoo.com
*Envoyé le :* Lundi 25 février 2013 0h08
*Objet :* Re: d3dx9: Avoid expensive computations
On 25 February 2013 10:24, Rico Schüller <kgbricola@web.de
mailto:kgbricola@web.de> wrote:
...
I did some small tests for speed with the following results. You may
also
...
avoid such a lot of variable assignments like *pout = out and you may
use 4
...
vecs instead. This should save ~48 assignments and it should also
improve
...
the speed a bit more (~10%). Though, native is still 40% faster than
that.
...
I'd somewhat expect native to use SSE versions of this kind of thing
when the CPU supports those instructions. You also generally want to
pay attention to the order in which you access memory, although
perhaps it doesn't matter so much here because an entire matrix should
be able to fit in a single cacheline, provided it's properly aligned.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: d3dx9: Avoid expensive computations