Re: D3D performance debugging report

1 May 2011


      Indeed, I've written a spinlock with GCC extension and replaced the 
EnterCriticalSection in the x11 drv file.
Apart that the lock has got to be recursive, so I implemented a quick 
(but incorrect) recursive spinlock for the purpose of running SC2 and 
difference was barely negligible.
The biggest issue imho is that in this case we have to call a 
function... it would be great to inline all that code, but again, 
probably the best thing is to limit the number of calls.
I can try a spinlock for the BKL-like which is wined3d lock. I hope this 
hasn't got to be recursive, right?
I'm asking this because in case of a recursive lock I'm performing an 
extra syscall:
static volatile pid_t    x11_lock = 0;
static volatile int        x11_lock_cnt = 0;
/***********************************************************************
  *        wine_tsx11_lock   (X11DRV.@)
  */
void CDECL wine_tsx11_lock(void)
{
     pid_t        th_id = syscall(SYS_gettid);    // This might be 
expensive!
                                                                   // I 
don't like recursive locks for this reason!
     while (th_id != __sync_val_compare_and_swap(&x11_lock, 0, th_id));
     ++x11_lock_cnt;
     asm volatile("lfence" ::: "memory");
}
/***********************************************************************
  *        wine_tsx11_unlock   (X11DRV.@)
  */
void CDECL wine_tsx11_unlock(void)
{
     if(!--x11_lock_cnt)
         x11_lock=0;
     asm volatile("sfence" ::: "memory");
}
Please keep in mind this is a test code, but apparently it's working.
Again, performance in case of SC2 isn't that much... but probably should 
test better/with other games?
Let me know,
Cheers,
On 01/05/11 09:33, Stefan Dösinger wrote:
...
On Saturday 30 April 2011 18:26:04 Emanuele Oriani wrote:
...
Hi Stefan,
What do you think about using inline spinlocks (in asm code maybe) to
implement locks?
Clearly an optimized spinlock would mean different code for different
compilers/architectures, but shouldn't it be the best solution?
I am usually pessimistic about hand-written assembler optimizations. You can
give it a try, but compilers are pretty clever these days.
I think trying to optimize the lock calls is a more promising way. We can't
simply drop the ENTER_GL/LEAVE_GL calls, as you found out in SC2. We may be
able to reduce the number of those calls by moving blocks of opengl calls
closer together.
There's also the wined3d lock, which is somewhat like the big kernel lock.
There's room for improvement there as well, if we soften the "you must call
wined3d under lock" rule. However the wined3d lock is the smaller problem
compared to the X11 lock.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: D3D performance debugging report