On 03/16/2011 08:34 AM, Alexandre Julliard wrote:
Adam Martinsonamartinson@codeweavers.com writes:
@@ -239,6 +243,19 @@ extern int getopt_long_only (int ___argc, char *const *___argv, int ffs( int x ); #endif
+#if defined(__GNUC__)&& (GCC_VERSION>= 30406)
- #define ctz(x) __builtin_ctz(x)
+#elif defined(__GNUC__)&& (defined(__i386__) || defined(__x86_64__))
- static inline int ctz( unsigned int x )
- {
int ret;
__asm__("bsfl %1, %0" : "=r" (ret) : "r" (x));
return ret;
- }
+#else
- #define ctz(x) (ffs(x)-1)
+#endif
There's no reason to add this. Just use ffs().
If I thought ffs() was adequate, I would. I need this for iterating sparse bitsets.
__builtin_ctz() compiles to: mov 0x8(%ebp),%eax bsf %eax,%eax
(ffs()-1) compiles to: mov $0xffffffff,%edx bsf 0x8(%ebp),%eax cmove %edx,%eax add $0x1,%eax sub $0x1,%eax
... Fortunately -O2 catches the add/sub. So yes, there is a reason, ctz() is at least 50% faster.