Other observations:
- my naive strcpy/strcat implementation seems more efficient than the
one in the glibc! That's pretty weird.
Probably because the glibc routines are heavily optimised for long strings. I did a load of experiments with different versions of memcpy, the instruction setup cost of 'rep movsl' is such that it is faster not to use it for copies of (IIRC) 180 bytes on my athlon. The 'rep movsb' used to copy the trailing bytes is definitely wasteful.
- cpycat is much more efficient in this type of scenario. That's not
very surprising of course. Why does the C library have such braindead functions as strcpy and strcat?
Probably goes back into the annals of Unix history. My guess is that the return value wasn't defined, but happened to be the destination buffer address on one of the first implementations. Some code used the fact and no one dared change it....
Probably similar to asking why the priority of | and == are backwards. (K/R didn't want to change any code when they invented ||.)
David