Indeed, I've written a spinlock with GCC extension and replaced the EnterCriticalSection in the x11 drv file. Apart that the lock has got to be recursive, so I implemented a quick (but incorrect) recursive spinlock for the purpose of running SC2 and difference was barely negligible. The biggest issue imho is that in this case we have to call a function... it would be great to inline all that code, but again, probably the best thing is to limit the number of calls. I can try a spinlock for the BKL-like which is wined3d lock. I hope this hasn't got to be recursive, right? I'm asking this because in case of a recursive lock I'm performing an extra syscall:
static volatile pid_t x11_lock = 0; static volatile int x11_lock_cnt = 0;
/*********************************************************************** * wine_tsx11_lock (X11DRV.@) */ void CDECL wine_tsx11_lock(void) { pid_t th_id = syscall(SYS_gettid); // This might be expensive! // I don't like recursive locks for this reason! while (th_id != __sync_val_compare_and_swap(&x11_lock, 0, th_id)); ++x11_lock_cnt; asm volatile("lfence" ::: "memory"); }
/*********************************************************************** * wine_tsx11_unlock (X11DRV.@) */ void CDECL wine_tsx11_unlock(void) { if(!--x11_lock_cnt) x11_lock=0; asm volatile("sfence" ::: "memory"); }
Please keep in mind this is a test code, but apparently it's working. Again, performance in case of SC2 isn't that much... but probably should test better/with other games?
Let me know, Cheers,
On 01/05/11 09:33, Stefan Dösinger wrote:
On Saturday 30 April 2011 18:26:04 Emanuele Oriani wrote:
Hi Stefan,
What do you think about using inline spinlocks (in asm code maybe) to implement locks? Clearly an optimized spinlock would mean different code for different compilers/architectures, but shouldn't it be the best solution?
I am usually pessimistic about hand-written assembler optimizations. You can give it a try, but compilers are pretty clever these days.
I think trying to optimize the lock calls is a more promising way. We can't simply drop the ENTER_GL/LEAVE_GL calls, as you found out in SC2. We may be able to reduce the number of those calls by moving blocks of opengl calls closer together.
There's also the wined3d lock, which is somewhat like the big kernel lock. There's room for improvement there as well, if we soften the "you must call wined3d under lock" rule. However the wined3d lock is the smaller problem compared to the X11 lock.