On Thu, Dec 01, 2005 at 11:09:29AM +0100, Alexandre Julliard wrote:
Robert Shearman rob@codeweavers.com writes:
- "shrl $2, %ecx\n\t" /* divide by 4 */
- "rep movsl\n\t" /* Copy dword blocks */
- "movl %eax, %ecx\n\t"
- "andl $3, %ecx\n\t" /* modulus 4 */
- "rep movsb\n\t" /* Copy remainder */
If the argument size is not a multiple of 4 you are in serious trouble...
Not only that, but the code above is not very efficient! The setup time for 'rep movsx' instruction is significant on many modern cpus, making the second 'rep movsb' particularly slow. I'm not even sure what the break-even length for the one is!
Sequence like (give or take assembler syntax): mov %eax,(%esi+%ecx-4) mov (%edi+%ecx-4),%eax shrl $2, %ecx rep movsl should be better given large enough %ecx
David