Hi, let me agree with you... probably there might be one or two fixes somewhere which will make performance go better, but apparently the idea is that now, if "early optimization is the root of all evils" was true a while a go, I guess we're at a place where all the small optimizations should take place. To confirm, SC2 performance difference without the D3D lock and the x11 spinlock, compared to current wine (1.3.19) is basically not noticeable. On top of this, be aware that apparently the D3D lock IS recursive (when using the non recursive spinlock a CPU was going 100% and application was stuck).
For example, the main issue with SC2 is that if you don't set the affinity of all threads to one core, the game will go slower by 40%; for this reason I think this game suffers of other issues related to other wine components (than D3D) and/or Linux scheduler. The main issue with SC2 is that setting different shaders level will heavily impact the game; and I'm running on a 470 GTX: this shouldn't be the case. Should we probably looking into the way we generate shaders (both GLSlang and ARB ones)? Another point, what about how we handle offscreen buffers? All FBO etc etc?
If I remember correctly someone was working on a worker thread for D3D. Did we abandon tis project? Given OpenGL doesn't (didn't?) support calls from multiple threads, should we be proceeding through this route?
Cheers,
On 01/05/11 15:10, Stefan Dösinger wrote:
On Sunday 01 May 2011 14:34:53 Emanuele Oriani wrote:
Indeed, I've written a spinlock with GCC extension and replaced the EnterCriticalSection in the x11 drv file. Apart that the lock has got to be recursive, so I implemented a quick (but incorrect) recursive spinlock for the purpose of running SC2 and difference was barely negligible.
How much was the difference?
The biggest issue imho is that in this case we have to call a function...
I don't think so. I did some tests for the call overhead, and it is fairly small. Specifically I tried to export the wined3d lock from wined3d and call EnterCriticalSection / LeaveCritSection directly from d3d9. The difference wasn't even measurable with my hyper-sensitive self-written test apps.
I can try a spinlock for the BKL-like which is wined3d lock. I hope this hasn't got to be recursive, right? I'm asking this because in case of a recursive lock I'm performing an extra syscall:
The wined3d lock doesn't have to be recursive I think. But note that getting those changes committed into Wine are next to zero. It's more likely to get an optimization of EnterCriticalSection / LeaveCriticalSection itself into wine.
Please keep in mind this is a test code, but apparently it's working. Again, performance in case of SC2 isn't that much... but probably should test better/with other games?
No, as I explained in my mail the individual optimizations don't magically fix all the performance woes we have. We'll probably have to collect a dozen or more such little fixes to start seeing movement.