Hi, Just did not feel like chasing bugs the other day. I decided to have some fun with something that I wondering for a long time: the usefulness of inline i86 assembly in string functions. This is the test program as.c: ---------------------------------8<------------------------------------- #include <malloc.h> typedef unsigned short WCHAR, *PWCHAR; static inline WCHAR *strcpyW( WCHAR *dst, const WCHAR *src ) { #ifdef ASM int dummy1, dummy2, dummy3; __asm__ __volatile__( "cld\n" "1:\tlodsw\n\t" "stosw\n\t" "testw %%ax,%%ax\n\t" "jne 1b" : "=&S" (dummy1), "=&D" (dummy2), "=&a" (dummy3) : "0" (src), "1" (dst) : "memory" ); #else WCHAR *p = dst; while ((*p++ = *src++)); #endif return dst; } #define SZ 3000 main() { int i; PWCHAR s,d; s=malloc(SZ*sizeof(WCHAR)); d=malloc(SZ*sizeof(WCHAR)); memset(s,'x',SZ); s[SZ-1]=0; for(i=0;i<1000000;i++) strcpyW(d,s); } ---------------------------------8<------------------------------------- The function strcpyW is a copy from Wine with the #ifdef modified. I used the following commands gcc-3.3 -O2 as.c -o as -DASM ; time ./as;time ./as; time ./as and gcc-3.3 -O2 as.c -o as ; time ./as;time ./as; time ./as The resulting times are (all user time): test# asm C ----------------------- 1 15.970 15.899 2 15.966 15.943 3 15.959 15.941 ------ ------ ave 15.964 15.928 Notes: - tested on a PII 450 MHz; - I tested with gcc 2.95 and 3.4.2 as well, result are essentially the same. - size of main() is 0x7a (assembly) vs 0x82 (C-code) bytes; - I experimented with longer strings to see if there was any mem cache hit/miss effects and found none. Conclusions: 1. these routines are so fast that it is hard to imagine that these functions will be a bottleneck, justifying such optimization; 2. nothing shows here that inline assembly brings any advantage. Rein. -- Rein Klazes rklazes(a)xs4all.nl