Francois Gouget fgouget@free.fr writes:
I don't like pieces of code that go:
strcpy(foo, bar1); strcat(foo, bar2); strcat(foo, bar3); strcat(foo, bar4); strcat(foo, bar5); strcat(foo, bar6);
It's really inefficient: the cost increases quadratically with the size of the resulting string.
Well, no, the cost is linear. It would only be quadratic if the number of strcat calls depended on the length of the string.
It's more efficient to do:
sprintf(foo, "%s%s%s%s%s%s", bar1,bar2,bar3,bar4,bar5,bar6);
I seriously doubt that sprintf would be faster that a couple of strcats. And I don't think we need to worry about this kind of micro-optimizations right now...
On Sat, Nov 30, 2002 at 10:49:34AM -0800, Alexandre Julliard wrote:
It's more efficient to do:
sprintf(foo, "%s%s%s%s%s%s", bar1,bar2,bar3,bar4,bar5,bar6);
In case you cannot be 100% sure of the lengths, it might still be worth it with snprintf, but otherwise, it's a matter of taste.
Ciao Jörg -- Joerg Mayer jmayer@loplof.de I found out that "pro" means "instead of" (as in proconsul). Now I know what proactive means.
It's really inefficient: the cost increases quadratically with the size of the resulting string.
Well, no, the cost is linear. It would only be quadratic if the number of strcat calls depended on the length of the string.
It's more efficient to do:
sprintf(foo, "%s%s%s%s%s%s", bar1,bar2,bar3,bar4,bar5,bar6);
While I agree with Alexandre that the argument for this change based on efficiency is not compelling, I find the resulting clarity of code refreshing. And perhaps it's not our top priority, but I think if we can encourage folks to tighten the code, that would be a Good Thing (TM).
Alexandre Julliard wrote:
Francois Gouget fgouget@free.fr writes:
Well, no, the cost is linear. It would only be quadratic if the number of strcat calls depended on the length of the string.
Well - still snprintf is more efficient.
It's more efficient to do:
sprintf(foo, "%s%s%s%s%s%s", bar1,bar2,bar3,bar4,bar5,bar6);
I seriously doubt that sprintf would be faster that a couple of strcats. And I don't think we need to worry about this kind of micro-optimizations right now...
But there is also no reason not to welcome these submissions if someone already took the time to submit them.
Shachar
Shachar Shemesh wine-devel@sun.consumer.org.il writes:
Well - still snprintf is more efficient.
I don't think so, but feel free to provide benchmarks.
But there is also no reason not to welcome these submissions if someone already took the time to submit them.
There's no objective reason why sprintf is better than strcat in that case, it's purely a matter of personal taste. As such, whoever writes the code in the first place gets to choose the way it's done. What if I apply that patch and someone sends a patch tomorrow changing the sprintfs back into strcats because he prefers that? Should I apply it? After all he took the time to submit it too...
Alexandre Julliard wrote:
Shachar Shemesh wine-devel@sun.consumer.org.il writes:
Well - still snprintf is more efficient.
I don't think so, but feel free to provide benchmarks.
Benchmark will follow soon. In the mean time, think about the fact that, compared to linear copying of the strings in, these are the overheads (neglecting function call overhead, which is not neglectible but is fair): n - number of strings in the final string m(i) - length of string i (0<i<=n) sm(i) - sigma of all lengths up to i (0<i<=n) sm(n) - total length of all strings with sprintf - parsing the format string*n+sm(n) with strcpy+strcat - for each strcat we have to find the end of the string (sm(i-1)), and then write our own string (m(i)).
But there is also no reason not to welcome these submissions if someone already took the time to submit them.
There's no objective reason why sprintf is better than strcat in that case, it's purely a matter of personal taste.
Not if you accept my performance claim. Also, there is the security thingamy.
The str* method is a bitch to make secure. You have to keep subtracting the already processed strings from the remaining buffer length, and then you run into the risk of subtracting more than you have, resulting in a negative number=buffer overflow despite your best efforts. There is the question of the null - yes/no. There is a horrible performance hit for strncpy for filling the entire buffer, while not promising null termination. In short, it is one major headache.
snprintf, on the other hand, is simple, to the point, and clean. You still have to make sure the buffer is terminated, but that's all.
As such, whoever writes the code in the first place gets to choose the way it's done.
Agreed, assuming ALLof my previous arguments are rejected.
What if I apply that patch and someone sends a patch tomorrow changing the sprintfs back into strcats because he prefers that? Should I apply it? After all he took the time to submit it too...
I said that due to the wall to wall agreement over the superiority of sprintf/snprintf. If there is a consensus, we should stick to it.
Shachar
I said that due to the wall to wall agreement over the superiority of sprintf/snprintf. If there is a consensus, we should stick to it.
But the performance of it will suck badly.....
Something like:
char * str_add(char *s, char *lim, const char *a) { int c;
do { if (s >= lim) { s = lim - 1; c = 0; } else c = *a++; *s++ = c; } while (c);
return s; }
So you can do: lim = buf + len; buf = str_add(buf, lim, "abc"); buf = str_add(buf, lim, "123"); ... might be safer.
David
When I'm wrong, I'm wrong.
sun@sun:~/sources/wine/test$ gcc -O0 strcpy.c -o way1 -DWAY1 sun@sun:~/sources/wine/test$ gcc -O0 strcpy.c -o way2 -DWAY2 sun@sun:~/sources/wine/test$ gcc -O0 strcpy.c -o way3 -DWAY3 sun@sun:~/sources/wine/test$ ./way1 Operation took 0 seconds 450763 usec sun@sun:~/sources/wine/test$ ./way2 Operation took 0 seconds 598408 usec sun@sun:~/sources/wine/test$ ./way3 Operation took 0 seconds 427037 usec
With higher O values, the difference becomes bigger, but I'm not sure then that some of the operations are not optimized out of the process, which makes the entire benchmark useless.
So, what are our conclusions? Do we just implement strlcpy and strlcat so that everyone has a function that is both efficient AND secure?
Do we go for David's suggestion, that is more efficient, but is also more cubersome and requires two extra vars to implement right?
Shachar
David Laight wrote:
I said that due to the wall to wall agreement over the superiority of sprintf/snprintf. If there is a consensus, we should stick to it.
But the performance of it will suck badly.....
Something like:
char * str_add(char *s, char *lim, const char *a) { int c;
do { if (s >= lim) { s = lim - 1; c = 0; } else c = *a++; *s++ = c; } while (c);
return s; }
So you can do: lim = buf + len; buf = str_add(buf, lim, "abc"); buf = str_add(buf, lim, "123"); ... might be safer.
David
Shachar Shemesh wrote:
Benchmark will follow soon. In the mean time, think about the fact that, compared to linear copying of the strings in, these are the overheads (neglecting function call overhead, which is not neglectible but is fair): n - number of strings in the final string m(i) - length of string i (0<i<=n) sm(i) - sigma of all lengths up to i (0<i<=n) sm(n) - total length of all strings with sprintf - parsing the format string*n+sm(n) with strcpy+strcat - for each strcat we have to find the end of the string (sm(i-1)), and then write our own string (m(i)).
Shachar Shemesh wrote:
When I'm wrong, I'm wrong.
sun@sun:~/sources/wine/test$ gcc -O0 strcpy.c -o way1 -DWAY1 sun@sun:~/sources/wine/test$ gcc -O0 strcpy.c -o way2 -DWAY2 sun@sun:~/sources/wine/test$ gcc -O0 strcpy.c -o way3 -DWAY3 sun@sun:~/sources/wine/test$ ./way1 Operation took 0 seconds 450763 usec sun@sun:~/sources/wine/test$ ./way2 Operation took 0 seconds 598408 usec sun@sun:~/sources/wine/test$ ./way3 Operation took 0 seconds 427037 usec
Was going to say earlier, but didn't get to it ... the reason is probably that you were underestimating the time required to parse the format string, which is probably greater than anything else. Everything else is simple searching and copying whereas the parsing is probably at least a quadratic-order function. Anyway you've now demonstrated that so this isn't that relevant anymore...
David
probably that you were underestimating the time required to parse the format string, which is probably greater than anything else. Everything else is simple searching and copying whereas the parsing is probably at least a quadratic-order function.
No it will be linear (on the length of the format string). But just rather more expensive than you probably expect. There is a lot of red tape lurking.
David
David Laight wrote:
probably that you were underestimating the time required to parse the format string, which is probably greater than anything else. Everything else is simple searching and copying whereas the parsing is probably at least a quadratic-order function.
No it will be linear (on the length of the format string). But just rather more expensive than you probably expect.
That's the reason that I thought it will be faster. However, since the test program copied 40 bytes 10 times, which is a rather extreme case, and still was not faster, we can only conclude that sprinf is not practical for almost all standard cases.
There is a lot of red tape lurking.
David
This, however, does not answer the usability issue. Obviously, we won't put in patches that convert to anything else just because someone didn't like it (yes, I agree with Alexander on this point). There is, however, the question of buffer overruns.
I suggest implementing strlcat and strlcpy, as in OpenBSD. I can write them, but I'm not sure where to place them. They should either be inlined (as in - implemented in an include file as a static func), or in some library that will be linked (statically, I hope). Ideas?
For those who don't know strl* functions - they are just like strn* functions, except that the parameters are sane: In strn*:
* Resulting string is not always NULL terminated. * In strncat, the terminating null is always written, but is not counted in the size!!! * The size parameter is sometimes the total size of the buffer, and sometimes the size of the unused portion of the buffer. * On strncpy, the entire buffer is written into, even if very little of it is needed (performance)
All of the above make for awkward and error prone coding. In order to use strncat properly, you have to keep subtracting the number of characters already copied, and 1 for the terminating nul. As a result of these subtractions, a negative overflow (i.e. - from 0 to maxint) may happen, which in turn causes buffer overruns itself. Unlike it, in the strl* functions:
* Resulting string is ALWAYS NULL terminated, whether there was enough room or not. * The buffer is not padded with NULLs beyond the terminating one (performance). * The number is ALWAYS the total buffer size. No arithmetics are necessary.
Shachar
I suggest implementing strlcat and strlcpy, as in OpenBSD. I can write them, but I'm not sure where to place them. They should either be inlined (as in - implemented in an include file as a static func), or in some library that will be linked (statically, I hope). Ideas?
Inlining them would (probably) be bad news. They are not a completely trivial size (ie smaller than the call sequence) and they are more likely to be resident in the Icache if they are functions.
They are useful though, and do what the programmer wanted (unlike strncpy whose only point is that it doesn't overrun the target buffer).
However in this case they aren't quite right!
len = strlcpy(buf, s0, buflen); len += strlcpy(buf + len, s1, buflen - len) len += strlcpy(buf + len, s2, buflen - len) is slightly more complex than using my str_add().
The source for the functions can be grabbed from netbsd or freebsd.
David
David Laight wrote:
Inlining them would (probably) be bad news. They are not a completely trivial size (ie smaller than the call sequence) and they are more likely to be resident in the Icache if they are functions.
Yes, but they are much more likely to fit into the function scope optimization if they are inlined.
They are useful though, and do what the programmer wanted (unlike strncpy whose only point is that it doesn't overrun the target buffer).
However in this case they aren't quite right!
len = strlcpy(buf, s0, buflen); len += strlcpy(buf + len, s1, buflen - len) len += strlcpy(buf + len, s2, buflen - len)
ahem. The code should be:
strlcpy(buf, s0, buflen); strlcat(buf, s1, buflen); strlcat(buf, s2, buflen);
If you are going to be sliding the buffer with each call, then you can just as well use your functions.
is slightly more complex than using my str_add().
The source for the functions can be grabbed from netbsd or freebsd.
David
Shachar Shemesh winehebhaim@sun.consumer.org.il writes:
I suggest implementing strlcat and strlcpy, as in OpenBSD. I can write them, but I'm not sure where to place them. They should either be inlined (as in - implemented in an include file as a static func), or in some library that will be linked (statically, I hope). Ideas?
We don't need that, there are Windows API functions like lstrcpyn that can be used for that. And in any case the right approach to writing correct and secure code is not to truncate every string in sight to some fixed buffer size; it's to make sure you allocate buffers of the right size, and then you can use standard strcpy/strcat/sprintf/etc. without worrying about lengths.
On Mon, 2 Dec 2002, Shachar Shemesh wrote:
When I'm wrong, I'm wrong.
sun@sun:~/sources/wine/test$ gcc -O0 strcpy.c -o way1 -DWAY1 sun@sun:~/sources/wine/test$ gcc -O0 strcpy.c -o way2 -DWAY2 sun@sun:~/sources/wine/test$ gcc -O0 strcpy.c -o way3 -DWAY3 sun@sun:~/sources/wine/test$ ./way1 Operation took 0 seconds 450763 usec sun@sun:~/sources/wine/test$ ./way2 Operation took 0 seconds 598408 usec sun@sun:~/sources/wine/test$ ./way3 Operation took 0 seconds 427037 usec
With higher O values, the difference becomes bigger, but I'm not sure then that some of the operations are not optimized out of the process, which makes the entire benchmark useless.
So, what are our conclusions? Do we just implement strlcpy and strlcat so that everyone has a function that is both efficient AND secure?
Do we go for David's suggestion, that is more efficient, but is also more cubersome and requires two extra vars to implement right?
Well, I tested this further. Result's attached (test.log). Columns are: WAY1 (strcat), WAY2 (sprintf), WAY2_5(snprintf), WAY3 (str_add), WAY4(stpcpy). WAY4 is something I remembered from old DOS days, but it's nonstandard and probably available only in glibc.
Since I used glibc compiled for i686 (pentium2), we probably should not consider entries with '-mcpu=pentium4' in comparision.
Test platform: Pentium4 Northwood; linux-2.4.20; glibc-2.2.4 compiled with optimizations for Pentium2.
My vote is for sprintf, as it can easily be converted to safer version and is easier to read.
My two cents only. :)
Michal Miroslaw
optimizations: -O2
strcpy/cat sprintf snprintf str_add stpcpy
2.926220 1.427674 1.408928 3.405458 0.892890 1000 2.622081 1.308464 1.292384 3.062085 0.867009 900 2.363464 1.169335 1.152487 2.727685 0.740690 800
These do not look right, in particular the first column is too fast. For instance my system gives (NUM_ITER=1000):
0.174988 0.042060 0.042097 0.057267 1000 0.155118 0.038678 0.038592 0.051500 900 0.138036 0.034586 0.034597 0.045832 800 ... 0.005573 0.006449 0.006452 0.002103 30 0.003858 0.005962 0.005964 0.001536 20 0.002163 0.005783 0.005802 0.000967 10
It is as if the code from your program is executing far slower than anything from libc!
David
Am Mon, 2002-12-02 um 13.57 schrieb David Laight:
It is as if the code from your program is executing far slower than anything from libc!
That is very well possible if he has the right libc. Take a look at the optimizations in the string functions in glibc, and you'll have an idea why. Unless you are a assembler programming Guru for a certain architecture, you'll have a hard time beating them.
Martin
It is as if the code from your program is executing far slower than anything from libc!
That is very well possible if he has the right libc. Take a look at the optimizations in the string functions in glibc, and you'll have an idea why. Unless you are a assembler programming Guru for a certain architecture, you'll have a hard time beating them.
I'm a fairly good assembler programmer....
Actually I don't have glibc - I'm running netbsd not linux. Netbsd might benefit from faster strxxx routines.
OTOH the times are very dependant on the cpu model! My slotA athlon 700 executes my str_add() faster the way I coded it - I tried the other order and it sucked. Similarly escaping with: lim[-1] = 0; return lim; didn't help.
Of course you can use the same tricks as glibc does to speed up your own variant of the copy routine.
David
Am Mon, 2002-12-02 um 20.15 schrieb David Laight:
Actually I don't have glibc - I'm running netbsd not linux. Netbsd might benefit from faster strxxx routines.
Can't you use an optimized glibc on NetBSD?
OTOH the times are very dependant on the cpu model! My slotA athlon 700 executes my str_add() faster the way I coded it - I tried the other order and it sucked.
With gcc 3.2? That compiler optimizes the jmp away in an if statement, apparently assuming that the "if" clause is executed much more often. The "else" clause, OTOH, jmp's to the end of the subroutine and jumps back afterwards. Thus if the compiler guesses wrongly, the CPU'll have to do two more jmp's per loop, with possible instruction cache miss. Hard for me to believe that'll be a speed improvement on any CPU. If the compiler guesses right, 1 jmp will be saved per loop.
Of course you can use the same tricks as glibc does to speed up your own variant of the copy routine.
I am not talking what you or I could or couldn't do. I am just saying Wine should rely on glibc for these things, and not try to outwit those who're doing these things all the time.
Martin
With gcc 3.2? That compiler optimizes the jmp away in an if statement, apparently assuming that the "if" clause is executed much more often. The "else" clause, OTOH, jmp's to the end of the subroutine and jumps back afterwards. Thus if the compiler guesses wrongly, the CPU'll have to do two more jmp's per loop, with possible instruction cache miss. Hard for me to believe that'll be a speed improvement on any CPU. If the compiler guesses right, 1 jmp will be saved per loop.
Actually 2.95.3 which doesn't seem to want to generate a loop with only one taken branch. Something must make the conditional forward jump be faster than the unconditional one. And yes, I couldn't believe it either :-)
David
With higher O values, the difference becomes bigger, but I'm not sure then that some of the operations are not optimized out of the process, which makes the entire benchmark useless.
I don't think they would be, and it is easy enough to check (objdump -d way1). Benchmarking is meaningless if you don't use the optimisation level then will be used in real life. Also the compiler is really designed to be used with -O, the code generatd without it is only 'half cooked'.
Do we go for David's suggestion, that is more efficient, but is also more cubersome and requires two extra vars to implement right?
The extra variables are likely to be well optimised by the compiler. strlcpy/cat will be even slower than the strcpy/cat versions.
way1: 900 ms way1a: 1321 (strlcpy/cat) way2: 721 ms way3: 322 ms way3a: 223 ms (without bound check)
David
Am Son, 2002-12-01 um 23.07 schrieb Shachar Shemesh:
When I'm wrong, I'm wrong.
sun@sun:~/sources/wine/test$ ./way1 Operation took 0 seconds 450763 usec sun@sun:~/sources/wine/test$ ./way2 Operation took 0 seconds 598408 usec sun@sun:~/sources/wine/test$ ./way3 Operation took 0 seconds 427037 usec
First off, the program is wrong in that "way3" doesn't do what it's supposed to (concatenate). Below is a patch for your benchmark with a "WAY4" going a similar path as WAY3, but using strlen() and memcpy(), and a WAY5 that is pretty similar to David's, but better with gcc 3.2.
My resume is: Don't try to outwit the C library. Assume that the user has installed a good one, and use library functions.
(Btw ever looked at the section about "strcat" in the glibc info pages?)
WAY4 below is very fast with a good C library. On i686, it is best for all optimizations except -O1 where gcc inlines self-generated i386-optimized code which is more than a factor of 2 slower than the library routine. It also requires the "-march=i686" option to gcc, otherwise gcc wrongly assumes that i386 optimizations are the best, and uses them always.
Of course, if the lengths of the strings are known, you can speed this up a lot by doing without strlen().
WAY2 depends on the "strcat" implementation in the library. It may be slower than the snprintf solution if strcat is badly optimized. (And it's unsafe)!
David's WAY3, being not library dependent, is quite good except for -O0. With GCC 3.2 and optimization -O2 and above, it is necessary to reverse the "if" and "else" clauses in str_add(), otherwise performance degrades because gcc assumes the "if" branch is taken more often. (That reversed str_add is called "WAY5" below.
These are my results, obtained on a PIII system with RedHat 8.0, using "gcc -Ox -march=i686" as optimization: (I took gcc3.2 -O0 as "reference value" 1.0):
GCC 3.2 -O0 -O1 -O2 -O3 WAY1 1.00 0.98 0.97 0.97 WAY2 0.93 0.98 0.92 0.93 WAY3 2.28 0.67 0.95 1.05 WAY4 0.49 1.00 0.46 0.41 WAY5 2.53 0.66 0.66 0.76 GCC 2.96 -O0 -O1 -O2 -O3 WAY1 1.00 0.98 1.02 1.02 WAY2 0.93 0.93 0.93 0.93 WAY3 2.21 0.67 0.69 0.63 WAY4 0.49 1.02 0.48 0.42 WAY5 2.42 0.67 0.69 0.63
Martin
--- strorg.c Mon Dec 2 13:29:43 2002 +++ strcpy.c Mon Dec 2 17:35:00 2002 @@ -12,15 +12,38 @@ int c;
do { +#if WAY5 + if ( s < lim ) c = *a++; + else { + s = lim - 1; + c = 0; + } +#else if (s >= lim) { s = lim - 1; c = 0; } else c = *a++; +#endif *s++ = c; } while (c);
- return s; + return --s; +} + +char* +str1_add(char*s, int *len, const char *a) +{ + + int l = strlen(a); + if ( l >= *len ) + l = *len - 1; + + memcpy ( s, a, l ) ; + *len -= l; + s += l; + *s = 0; + return s; }
int main() @@ -32,7 +55,7 @@ gettimeofday(&before, NULL); for( i=0; i<NUM_ITER; ++i ) { -#ifdef WAY1 +#if WAY1 strcpy( buffer, STRING ); strcat( buffer, STRING ); strcat( buffer, STRING ); @@ -46,21 +69,33 @@ #elif WAY2 sprintf( buffer, "%s%s%s%s%s%s%s%s%s%s", STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING ); -#elif WAY3 +#elif ( WAY3 || WAY5 ) char *pointer=buffer; char *const limit=buffer+sizeof(buffer);
- buffer[0]='\0'; - pointer=str_add( buffer, limit, STRING ); - pointer=str_add( buffer, limit, STRING ); - pointer=str_add( buffer, limit, STRING ); - pointer=str_add( buffer, limit, STRING ); - pointer=str_add( buffer, limit, STRING ); - pointer=str_add( buffer, limit, STRING ); - pointer=str_add( buffer, limit, STRING ); - pointer=str_add( buffer, limit, STRING ); - pointer=str_add( buffer, limit, STRING ); - pointer=str_add( buffer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); + pointer=str_add( pointer, limit, STRING ); +#elif WAY4 + char *p = buffer; + int len = sizeof(buffer); + p = str1_add (p, &len, STRING); + p = str1_add (p, &len, STRING); + p = str1_add (p, &len, STRING); + p = str1_add (p, &len, STRING); + p = str1_add (p, &len, STRING); + p = str1_add (p, &len, STRING); + p = str1_add (p, &len, STRING); + p = str1_add (p, &len, STRING); + p = str1_add (p, &len, STRING); + p = str1_add (p, &len, STRING); #endif } gettimeofday(&after, NULL); @@ -70,13 +105,7 @@ diff_sec=after.tv_sec-before.tv_sec; diff_usec=after.tv_usec-before.tv_usec;
- if( diff_usec<0 ) - { - diff_usec+=1000000; - diff_sec--; - } - - printf("Operation took %ld seconds %ld usec\n", diff_sec, diff_usec ); + printf("Operation took %07ld usec\n", diff_sec * 1000000 + diff_usec ); }
return 0;
On 30 Nov 2002, Alexandre Julliard wrote:
Francois Gouget fgouget@free.fr writes:
I don't like pieces of code that go:
strcpy(foo, bar1); strcat(foo, bar2); strcat(foo, bar3); strcat(foo, bar4); strcat(foo, bar5); strcat(foo, bar6);
It's really inefficient: the cost increases quadratically with the size of the resulting string.
Well, no, the cost is linear. It would only be quadratic if the number of strcat calls depended on the length of the string.
True.
It's more efficient to do:
sprintf(foo, "%s%s%s%s%s%s", bar1,bar2,bar3,bar4,bar5,bar6);
I seriously doubt that sprintf would be faster that a couple of strcats. And I don't think we need to worry about this kind of micro-optimizations right now...
Which is why I'm not going through the source looking for them. It's just that sometimes I find one and cringe when I think at the inefficiency of reading the first string 3 or more times and having even more stuff to read with each strcat.
This seems like a popular subject and everyone goes with their benchmark. So I had to do one too<g>.
I took what I think is a quite common case where one does concatenates three strings, typically a path with a filename:
sprintf(buf,"%s/%s",path,filename);
The goal of my benchmark (attached) is to determine for what path length an sprintf becomes as efficient as an strcpy+2*strcat combination. I implemented the above operation using: * sprintf * strcpy+strcat * my own naive strcpy+strcat, just to see how much difference it makes compared to the optimized glibc implementation * a cpycat implementation which is how strcpy and strcat should have been implemented in the first place. cpycat returns a pointer to the trailing '\0' which means you can chain calls to cpycat.
The test have been run on Debian (so assume the glibc may not be optimized much), on an Athlon processor and compiled with a simple '-O2 -fPIC', i.e. with the same options as Wine. The cutoff point turns out to be for a path length of about 220 characters:
$ ./sprintf 100 len=100 sprintf: elapsed=45791 strcat: elapsed=27811 mystrcat: elapsed=27830 cpycat: elapsed=13205
$ ./sprintf 220 len=220 sprintf: elapsed=54540 strcat: elapsed=55507 mystrcat: elapsed=53610 cpycat: elapsed=21876
$ ./sprintf 300 len=300 sprintf: elapsed=63352 strcat: elapsed=76068 mystrcat: elapsed=73931 cpycat: elapsed=30482
So yes, most of the time sprintf is probably not more efficient than strcpy+strcat. Just for fun I added a 5 string case ( add '.'+extension) and the cutoff goes down to 120 characters which is still quite long. That being said I still prefer the sprintf approach because it has a much better worst case behavior.
Other observations: * my naive strcpy/strcat implementation seems more efficient than the one in the glibc! That's pretty weird. * cpycat is much more efficient in this type of scenario. That's not very surprising of course. Why does the C library have such braindead functions as strcpy and strcat?
Oh, and I'll side with Alexandre to say that in most cases, a better alternative to using snprintf/strncpy/strncat is to carefully mesure the length of your arguments, compute the exact length of the concatenated string (or at least an upper bound, e.g. in the case of widechar to multi-byte conversions) and allocate a buffer of the required size. Do anything else you will end up with truncated data in your buffer which means your function probably did not perform up to spec.
Attached the source of my benchmark program in case anyone wants to have a look at it.
Other observations:
- my naive strcpy/strcat implementation seems more efficient than the
one in the glibc! That's pretty weird.
Probably because the glibc routines are heavily optimised for long strings. I did a load of experiments with different versions of memcpy, the instruction setup cost of 'rep movsl' is such that it is faster not to use it for copies of (IIRC) 180 bytes on my athlon. The 'rep movsb' used to copy the trailing bytes is definitely wasteful.
- cpycat is much more efficient in this type of scenario. That's not
very surprising of course. Why does the C library have such braindead functions as strcpy and strcat?
Probably goes back into the annals of Unix history. My guess is that the return value wasn't defined, but happened to be the destination buffer address on one of the first implementations. Some code used the fact and no one dared change it....
Probably similar to asking why the priority of | and == are backwards. (K/R didn't want to change any code when they invented ||.)
David
On Mon, 2 Dec 2002, Francois Gouget wrote:
Other observations:
- my naive strcpy/strcat implementation seems more efficient than the
one in the glibc! That's pretty weird.
If you have glibc from binary package compiled with optimizations for other processor that you have, it is possible.
- cpycat is much more efficient in this type of scenario. That's not
very surprising of course. Why does the C library have such braindead functions as strcpy and strcat?
Since we are talking about catenating strings only (no %d and family), then I would suggest combining speed of cpycat (in glibc there's stpcpy) and ease of use of sprintf and use something like that:
#include <stdarg.h> char *strpcpymore(char *buf, ...) { const char *p; va_list ap;
va_start(ap, buf); while ((p = va_arg(ap, const char *))) buf = stpcpy(buf, p); va_end(ap); return buf; }
Then we could write:
strpcpymore(buffer, "path", "/", "file", ".", "ext", NULL);
And make everybody happy.
Michal Miroslaw
On Tue, 3 Dec 2002, Michal Janusz Miroslaw wrote: [...]
Since we are talking about catenating strings only (no %d and family), then I would suggest combining speed of cpycat (in glibc there's stpcpy) and ease of use of sprintf and use something like that:
[...]
Then we could write:
strpcpymore(buffer, "path", "/", "file", ".", "ext", NULL);
Yep, that looks good. I guess it would return a pointer to the terminating '\0' too :-) However I have to note that it is a bit less flexible than the basic cpycat as you cannot do things like:
cpycat(buf,foo); if (condition1) cpycat(buffer,bar); if (condition2) cpycat(buffer,stuff); cpycat(buffer,more);
But yes, any of these would have been better than the braindead strcpy and strcat implementations we have in C libraries.