On Thu, Feb 10, 2005 at 08:12:39PM +0000, Mike Hearn wrote:
On Thu, 10 Feb 2005 18:59:21 +0100, Dietrich Teickner wrote:
I have a suggestion for a faster implementation of the zero_bit_scan in RtlFindClearBits [NTDLL.@] (rlbitmap.c) for e.g. TlsAlloc() The main is the usage of the instruction 'bsf eax, eax'
This I have implemented in the new experimental odinxp-tree for finding the first zero_bit in the first 'bytecount' bytes of the bitmap addr.
Does this actually make a noticeable difference? Rewriting stuff in assembly for theoretical performance improvements isn't so great, as far fewer people can read/write assembly than C.
I'd also add that you need to check that using 'bsf' is EVER a gain! An i386 might execute it faster than the corresponding C, but there is no guarantee that a P4i/Athlon will. Oh, and you need to do any tests with the code out of the cache.
David