Hello,
since msvcrt isn't relying on the standard library memmove/memcpy anymore, there's been a pretty bad performance regression. See https://bugs.winehq.org/ show_bug.cgi?id=49663.
For the best performance, and since those memory operations are pretty common, we'd presumably like to optimize them as much as possible. You might have seen my patch for an implementation from musl, although Zebediah rightfully pointed out we might want to opt for the best performance we can get... glibc currently offers the best performance, thanks to SSE/AVX implementations and runtime selection of the best supported path.
First, would you have any objections adding specialized paths written in assembly for x86? And if we were to add them, would we link against assembly files, or someway transform them into inline assembly? AFAIK, Wine didn't come with pure assembly files yet...
If you want, I could set up a few crude benchmarks to see how different versions compare.
Regards, Fabian Maurer
Regarding memcpy performance, I also recently came through suboptimal memcpy / memmove performance while doing perf analysis of Shadow of The Tomb Rider game. While in that case I did not find memcpy to be responsible for any sufficient slow down (maybe ~2-3 fps as maximum together with math functions implementation), it brought attention by consistently appearing in perf top and taking some measurable CPU time estimated otherwise.
I am attaching a very short test program. That runs ~7.4s using builtin vcruntime140 here and ~2s using native vcruntime140 under Wine (compiled as x86_64-w64-mingw32-gcc ./memcpyperf.c -o memcpyperf).
On 8/14/20 11:27, piotr@codeweavers.com wrote:
Hi Fabian,
I'll be back from vacation on Monday (currently I have very limited internet access). I'll look on it then.
I'm not sure how complicated the assembly implementation is but I'm expecting that a separated assembly file will not be needed. Also, AFAIK, we can't take the implementation from glibc. It would be also useful to know how efficient Microsoft implementation is.
Musl also have platform specific implementation of memove (for i386 and x64) written is assembly. I bet it should be good enough for Wine.
Thanks, Piotr
On Aug 12, 2020 23:33, Fabian Maurer dark.shadow4@web.de wrote:
Hello, since msvcrt isn't relying on the standard library memmove/memcpy anymore, there's been a pretty bad performance regression. See https://bugs.winehq.org/ show_bug.cgi?id=49663. For the best performance, and since those memory operations are pretty common, we'd presumably like to optimize them as much as possible. You might have seen my patch for an implementation from musl, although Zebediah rightfully pointed out we might want to opt for the best performance we can get... glibc currently offers the best performance, thanks to SSE/AVX implementations and runtime selection of the best supported path. First, would you have any objections adding specialized paths written in assembly for x86? And if we were to add them, would we link against assembly files, or someway transform them into inline assembly? AFAIK, Wine didn't come with pure assembly files yet... If you want, I could set up a few crude benchmarks to see how different versions compare. Regards, Fabian Maurer
On 8/14/20 3:27 AM, piotr@codeweavers.com wrote:
Hi Fabian,
I'll be back from vacation on Monday (currently I have very limited internet access). I'll look on it then.
I'm not sure how complicated the assembly implementation is but I'm expecting that a separated assembly file will not be needed. Also, AFAIK, we can't take the implementation from glibc. It would be also useful to know how efficient Microsoft implementation is.
I believe you are correct. I misread their licensing files and thought they used LGPL 2, not GPL 2.
Musl also have platform specific implementation of memove (for i386 and x64) written is assembly. I bet it should be good enough for Wine.
Thanks, Piotr
On Aug 12, 2020 23:33, Fabian Maurer dark.shadow4@web.de wrote:
Hello, since msvcrt isn't relying on the standard library memmove/memcpy anymore, there's been a pretty bad performance regression. See https://bugs.winehq.org/ show_bug.cgi?id=49663. For the best performance, and since those memory operations are pretty common, we'd presumably like to optimize them as much as possible. You might have seen my patch for an implementation from musl, although Zebediah rightfully pointed out we might want to opt for the best performance we can get... glibc currently offers the best performance, thanks to SSE/AVX implementations and runtime selection of the best supported path. First, would you have any objections adding specialized paths written in assembly for x86? And if we were to add them, would we link against assembly files, or someway transform them into inline assembly? AFAIK, Wine didn't come with pure assembly files yet... If you want, I could set up a few crude benchmarks to see how different versions compare. Regards, Fabian Maurer
On 8/14/20 9:18 AM, Zebediah Figura wrote:
On 8/14/20 3:27 AM, piotr@codeweavers.com wrote:
Hi Fabian,
I'll be back from vacation on Monday (currently I have very limited internet access). I'll look on it then.
I'm not sure how complicated the assembly implementation is but I'm expecting that a separated assembly file will not be needed. Also, AFAIK, we can't take the implementation from glibc. It would be also useful to know how efficient Microsoft implementation is.
I believe you are correct. I misread their licensing files and thought they used LGPL 2, not GPL 2.
As Henri points out, I am even more confused than that. The project uses several licenses, so one must truly read the header of the file in question.
All of the x86_64 implementations of memmove(), at least, seem to be under LGPL 2.1. Unless there's another, unrelated reason why we can't use them?
Musl also have platform specific implementation of memove (for i386 and x64) written is assembly. I bet it should be good enough for Wine.
Thanks, Piotr
On Aug 12, 2020 23:33, Fabian Maurer dark.shadow4@web.de wrote:
Hello, since msvcrt isn't relying on the standard library memmove/memcpy anymore, there's been a pretty bad performance regression. See https://bugs.winehq.org/ show_bug.cgi?id=49663. For the best performance, and since those memory operations are pretty common, we'd presumably like to optimize them as much as possible. You might have seen my patch for an implementation from musl, although Zebediah rightfully pointed out we might want to opt for the best performance we can get... glibc currently offers the best performance, thanks to SSE/AVX implementations and runtime selection of the best supported path. First, would you have any objections adding specialized paths written in assembly for x86? And if we were to add them, would we link against assembly files, or someway transform them into inline assembly? AFAIK, Wine didn't come with pure assembly files yet... If you want, I could set up a few crude benchmarks to see how different versions compare. Regards, Fabian Maurer
On Fri, 14 Aug 2020, piotr@codeweavers.com wrote:
I'm not sure how complicated the assembly implementation is but I'm expecting that a separated assembly file will not be needed. Also, AFAIK, we can't take the implementation from glibc. It would be also useful to know how efficient Microsoft implementation is.
Musl also have platform specific implementation of memove (for i386 and x64) written is assembly. I bet it should be good enough for Wine.
FWIW, I happened to try to do some sort of benchmark of various memcpy implementations (for a different usecase in mingw-w64) recently, see https://sourceforge.net/p/mingw-w64/mailman/message/37030146/ for the measurements and a link to the tool I used for testing.
My conclusion there was that the musl x86_64 assembly implementation looks really good, and the musl C implementation also behaved pretty well, if compiled with GCC. The musl C implementation if compiled with clang was rather slow though.
// Martin