Something like the following may be faster:
static const int bit_pos[] =
{
0x00, 0x01, 0x0e, 0x02, 0x0f, 0x1b, 0x12, 0x03,
0x1e, 0x10, 0x1c, 0x16, 0x18, 0x13, 0x0a, 0x04,
0x1f, 0x0d, 0x1a, 0x11, 0x1d, 0x15, 0x17, 0x09,
0x0c, 0x19, 0x14, 0x08, 0x0b, 0x07, 0x06, 0x05,
};
const unsigned int c = 0x7dc4d65;
return bit_pos[((x & -x) * c) >> 27];
I'd expect that mostly due to eliminating the branches and the
implicit shift & mask operations in something like "y.c[1]".