Re: wineport: Add fast builtin & asm versions of ffs() + ctz() where supported.
Adam Martinson <amartinson(a)codeweavers.com> writes:
@@ -236,7 +241,40 @@ extern int getopt_long_only (int ___argc, char *const *___argv, #endif /* HAVE_GETOPT_LONG */
#ifndef HAVE_FFS -int ffs( int x ); + #if defined(__i386__) || defined(__x86_64__) + #define HAVE_FFS + static inline int ffs( int x ) + { + if (!x) + { + return 0; + } + else + { + int ret; + __asm__("bsfl %1, %0; incl %0" : "=r" (ret) : "r" (x)); + return ret; + } + } + #elif defined(__GNUC__) && GCC_VERSION >= 29503 + #define HAVE_FFS + #define ffs(x) __builtin_ffs(x) + #else + int ffs( int x ); + #endif +#endif
You'd have to show benchmarks to prove that this complexity is necessary. Given that ffs() should already be inlined on all decent platforms, I doubt you'd be able to demonstrate a difference (if anything, your version would be slower because of the extra increment). -- Alexandre Julliard julliard(a)winehq.org
On Fri, Mar 11, 2011 at 02:28:27PM +0100, Alexandre Julliard wrote:
Adam Martinson <amartinson(a)codeweavers.com> writes:
@@ -236,7 +241,40 @@ extern int getopt_long_only (int ___argc, char *const *___argv, #endif /* HAVE_GETOPT_LONG */
#ifndef HAVE_FFS -int ffs( int x ); + #if defined(__i386__) || defined(__x86_64__) ... + __asm__("bsfl %1, %0; incl %0" : "=r" (ret) : "r" (x)); + return ret; ... + #elif defined(__GNUC__) && GCC_VERSION >= 29503 + #define HAVE_FFS + #define ffs(x) __builtin_ffs(x) + #else + int ffs( int x ); + #endif +#endif
You'd have to show benchmarks to prove that this complexity is necessary. Given that ffs() should already be inlined on all decent platforms, I doubt you'd be able to demonstrate a difference (if anything, your version would be slower because of the extra increment).
Never mind the fact that it is possible to write code that is (probably) faster than the bsfl instruction on most x86 processors! You'd also get a gain from getting the 'if (x == 0)' statically predicted correctly (probably for not zero) - that is also likely to be large. David -- David Laight: david(a)l8s.co.uk
On 03/11/2011 07:28 AM, Alexandre Julliard wrote:
Adam Martinson<amartinson(a)codeweavers.com> writes:
@@ -236,7 +241,40 @@ extern int getopt_long_only (int ___argc, char *const *___argv, #endif /* HAVE_GETOPT_LONG */
#ifndef HAVE_FFS -int ffs( int x ); + #if defined(__i386__) || defined(__x86_64__) + #define HAVE_FFS + static inline int ffs( int x ) + { + if (!x) + { + return 0; + } + else + { + int ret; + __asm__("bsfl %1, %0; incl %0" : "=r" (ret) : "r" (x)); + return ret; + } + } + #elif defined(__GNUC__)&& GCC_VERSION>= 29503 + #define HAVE_FFS + #define ffs(x) __builtin_ffs(x) + #else + int ffs( int x ); + #endif +#endif You'd have to show benchmarks to prove that this complexity is necessary. Given that ffs() should already be inlined on all decent platforms, I doubt you'd be able to demonstrate a difference (if anything, your version would be slower because of the extra increment).
I did this because it was easy and I was doing ctz() anyhow; I don't actually need these versions of ffs() for anything. On any system with HAVE_FFS the system version takes precedence.
participants (3)
-
Adam Martinson -
Alexandre Julliard -
David Laight