With higher O values, the difference becomes bigger, but I'm not sure then that some of the operations are not optimized out of the process, which makes the entire benchmark useless.
I don't think they would be, and it is easy enough to check (objdump -d way1). Benchmarking is meaningless if you don't use the optimisation level then will be used in real life. Also the compiler is really designed to be used with -O, the code generatd without it is only 'half cooked'.
Do we go for David's suggestion, that is more efficient, but is also more cubersome and requires two extra vars to implement right?
The extra variables are likely to be well optimised by the compiler. strlcpy/cat will be even slower than the strcpy/cat versions.
way1: 900 ms way1a: 1321 (strlcpy/cat) way2: 721 ms way3: 322 ms way3a: 223 ms (without bound check)
David