http://bugs.winehq.org/show_bug.cgi?id=18916
--- Comment #15 from Henri Verbeet hverbeet@gmail.com 2009-06-16 08:44:28 --- (In reply to comment #14)
How about this then (I take it WORD is unsigned, otherwise this won't quite work):
WORD d15 = source[x] & 0xfffe; DWORD d24 = (d15 << 8) + (d15 >> 7);
? Saves multiplies and more importantly divides and is pretty accurate: effectively I've rewritten
1/(2^15-1) ~~ 2^-15 + 2^-30
(the 2^-45 term won't interest us) and so
(2^24-1)/(2^15-1) * d15/2 ~~ 2^8 d15 + 2^-7 d15 .
Certainly this works properly for d15 = 0xfffe. I'm not worried about the dangling half bit. I can do a detailed error analysis if you're not convinced, the worst error is just under half a bit -- if we really care, I can try to figure out how to get it back, but I'm sure we don't ;-)
That half bit does get rounded up to a whole bit on occasion of course, but I can live with that. As far as I'm concerned, feel free to submit that as a patch. You might as well just write "dest[x] = (((source[x] & 0xfffe) << 16) + ((source[x] & 0xff80) << 1)) | (source[x] & 0x1);" though.
I do this kind of nonsense a lot on the ARM, where divide is to be studiously avoided in inner loops!
I don't think it's quite as expensive on modern x86, but it doesn't hurt to avoid.