Rico,
can you give a try to this patch? If it is slightly slower than native, we could at first merge it.
Anyway, if the application is well coded, this function should not be called often. Usually an application uses transformations matrices that are a lot easier to inverse
Nozomi
________________________________ De : Henri Verbeet hverbeet@gmail.com À : Rico Schüller kgbricola@web.de Cc : wine-devel@winehq.org; Nozomi Kodama nozomi.kodama@yahoo.com Envoyé le : Lundi 25 février 2013 0h08 Objet : Re: d3dx9: Avoid expensive computations
On 25 February 2013 10:24, Rico Schüller kgbricola@web.de wrote:
I did some small tests for speed with the following results. You may also avoid such a lot of variable assignments like *pout = out and you may use 4 vecs instead. This should save ~48 assignments and it should also improve the speed a bit more (~10%). Though, native is still 40% faster than that.
I'd somewhat expect native to use SSE versions of this kind of thing when the CPU supports those instructions. You also generally want to pay attention to the order in which you access memory, although perhaps it doesn't matter so much here because an entire matrix should be able to fit in a single cacheline, provided it's properly aligned.