Adam Martinson amartinson@codeweavers.com writes:
@@ -236,7 +241,40 @@ extern int getopt_long_only (int ___argc, char *const *___argv, #endif /* HAVE_GETOPT_LONG */
#ifndef HAVE_FFS -int ffs( int x );
- #if defined(__i386__) || defined(__x86_64__)
#define HAVE_FFS
static inline int ffs( int x )
{
if (!x)
{
return 0;
}
else
{
int ret;
__asm__("bsfl %1, %0; incl %0" : "=r" (ret) : "r" (x));
return ret;
}
}
- #elif defined(__GNUC__) && GCC_VERSION >= 29503
#define HAVE_FFS
#define ffs(x) __builtin_ffs(x)
- #else
int ffs( int x );
- #endif
+#endif
You'd have to show benchmarks to prove that this complexity is necessary. Given that ffs() should already be inlined on all decent platforms, I doubt you'd be able to demonstrate a difference (if anything, your version would be slower because of the extra increment).
On Fri, Mar 11, 2011 at 02:28:27PM +0100, Alexandre Julliard wrote:
Adam Martinson amartinson@codeweavers.com writes:
@@ -236,7 +241,40 @@ extern int getopt_long_only (int ___argc, char *const *___argv, #endif /* HAVE_GETOPT_LONG */
#ifndef HAVE_FFS -int ffs( int x );
- #if defined(__i386__) || defined(__x86_64__)
...
__asm__("bsfl %1, %0; incl %0" : "=r" (ret) : "r" (x));
return ret;
...
- #elif defined(__GNUC__) && GCC_VERSION >= 29503
#define HAVE_FFS
#define ffs(x) __builtin_ffs(x)
- #else
int ffs( int x );
- #endif
+#endif
You'd have to show benchmarks to prove that this complexity is necessary. Given that ffs() should already be inlined on all decent platforms, I doubt you'd be able to demonstrate a difference (if anything, your version would be slower because of the extra increment).
Never mind the fact that it is possible to write code that is (probably) faster than the bsfl instruction on most x86 processors! You'd also get a gain from getting the 'if (x == 0)' statically predicted correctly (probably for not zero) - that is also likely to be large.
David
On 03/11/2011 07:28 AM, Alexandre Julliard wrote:
Adam Martinsonamartinson@codeweavers.com writes:
@@ -236,7 +241,40 @@ extern int getopt_long_only (int ___argc, char *const *___argv, #endif /* HAVE_GETOPT_LONG */
#ifndef HAVE_FFS -int ffs( int x );
- #if defined(__i386__) || defined(__x86_64__)
#define HAVE_FFS
static inline int ffs( int x )
{
if (!x)
{
return 0;
}
else
{
int ret;
__asm__("bsfl %1, %0; incl %0" : "=r" (ret) : "r" (x));
return ret;
}
}
- #elif defined(__GNUC__)&& GCC_VERSION>= 29503
#define HAVE_FFS
#define ffs(x) __builtin_ffs(x)
- #else
int ffs( int x );
- #endif
+#endif
You'd have to show benchmarks to prove that this complexity is necessary. Given that ffs() should already be inlined on all decent platforms, I doubt you'd be able to demonstrate a difference (if anything, your version would be slower because of the extra increment).
I did this because it was easy and I was doing ctz() anyhow; I don't actually need these versions of ffs() for anything. On any system with HAVE_FFS the system version takes precedence.