On 03/16/2011 08:34 AM, Alexandre Julliard wrote:
Adam Martinson<amartinson(a)codeweavers.com> writes:
@@ -239,6 +243,19 @@ extern int getopt_long_only (int ___argc, char *const *___argv, int ffs( int x ); #endif
+#if defined(__GNUC__)&& (GCC_VERSION>= 30406) + #define ctz(x) __builtin_ctz(x) +#elif defined(__GNUC__)&& (defined(__i386__) || defined(__x86_64__)) + static inline int ctz( unsigned int x ) + { + int ret; + __asm__("bsfl %1, %0" : "=r" (ret) : "r" (x)); + return ret; + } +#else + #define ctz(x) (ffs(x)-1) +#endif There's no reason to add this. Just use ffs().
If I thought ffs() was adequate, I would. I need this for iterating sparse bitsets. __builtin_ctz() compiles to: mov 0x8(%ebp),%eax bsf %eax,%eax (ffs()-1) compiles to: mov $0xffffffff,%edx bsf 0x8(%ebp),%eax cmove %edx,%eax add $0x1,%eax sub $0x1,%eax ... Fortunately -O2 catches the add/sub. So yes, there is a reason, ctz() is at least 50% faster.