Hello Piotr,
it would probably also be useful to benchmark the different glibc implementations. Because for games, 10% more speed would be nice. I doubt a C implementation can compete with an AVX based one.
You patch is only affecting a subset of memmove calls. It also slows down
some cases a lot (around 1.5-2 times).
You mean the cases were we could use memcpy?
I've also tested full implementation from musl (that uses their memcpy
implementation in some cases). It performs much better. It's much slower than native if buffers overlap (around 3 times slower).
musl is slower in a lot of cases. I'm attaching a cheap test program. You can compile it normally with "gcc" or you can link musl static with "musl-gcc". That should compare the best glibc implementation vs the best musl implementation. Correct me if I'm wrong though.
Regards, Fabian Maurer