Re: wineport: Add support for ctz().

17 Mar 2011


      On Wed, Mar 16, 2011 at 01:26:31PM -0500, Adam Martinson wrote:
...
__builtin_ctz() compiles to:
mov    0x8(%ebp),%eax
bsf    %eax,%eax
(ffs()-1) compiles to:
mov    $0xffffffff,%edx
bsf    0x8(%ebp),%eax
cmove  %edx,%eax
...
...
So yes, there is a reason, ctz() is at least 50% faster.
I'm not where you get 50% from!
I've read both the intel and amd x86 instruction performance manuals
(but can't clain to remember all of it!).
The 'bsf' will be a slow instruction (with constraints on where it
exectutes, and what can execute in parallel).
The 'cmove' has even worse constraints since it can't execute until
the 'flags' from the previous instruction are known.
cmove is only slightly better than a mis-predicted branch!
In this case there will be complete pipeline stall between the 'bsf'
and the 'cmove'.
ffs would probably execute faster with a forwards conditional branch 
(predicted not taken) in the 'return -1' path.
David
-- 
David Laight: david@l8s.co.uk

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: wineport: Add support for ctz().