With gcc 3.2? That compiler optimizes the jmp away in an if statement, apparently assuming that the "if" clause is executed much more often. The "else" clause, OTOH, jmp's to the end of the subroutine and jumps back afterwards. Thus if the compiler guesses wrongly, the CPU'll have to do two more jmp's per loop, with possible instruction cache miss. Hard for me to believe that'll be a speed improvement on any CPU. If the compiler guesses right, 1 jmp will be saved per loop.
Actually 2.95.3 which doesn't seem to want to generate a loop with only one taken branch. Something must make the conditional forward jump be faster than the unconditional one. And yes, I couldn't believe it either :-)
David