msvcrt - memmove/memcpy optimizations - wine-devel

List overview All Threads

newer

msvcrt - memmove/memcpy optimizations

older

[PATCH] winmm: Default to 1ms...

[PATCH v2 4/4] ntdll: Fix arm64...

Fabian Maurer

12 Aug 2020 12 Aug '20

9:33 p.m.

Hello,

since msvcrt isn't relying on the standard library memmove/memcpy anymore, there's been a pretty bad performance regression. See https://bugs.winehq.org/ show_bug.cgi?id=49663.

For the best performance, and since those memory operations are pretty common, we'd presumably like to optimize them as much as possible. You might have seen my patch for an implementation from musl, although Zebediah rightfully pointed out we might want to opt for the best performance we can get... glibc currently offers the best performance, thanks to SSE/AVX implementations and runtime selection of the best supported path.

First, would you have any objections adding specialized paths written in assembly for x86? And if we were to add them, would we link against assembly files, or someway transform them into inline assembly? AFAIK, Wine didn't come with pure assembly files yet...

If you want, I could set up a few crude benchmarks to see how different versions compare.

Regards, Fabian Maurer

Show replies by date

piotr＠codeweavers.com

14 Aug 14 Aug

8:27 a.m.

Paul Gofman

11:46 a.m.

Regarding memcpy performance, I also recently came through suboptimal memcpy / memmove performance while doing perf analysis of Shadow of The Tomb Rider game. While in that case I did not find memcpy to be responsible for any sufficient slow down (maybe ~2-3 fps as maximum together with math functions implementation), it brought attention by consistently appearing in perf top and taking some measurable CPU time estimated otherwise.

I am attaching a very short test program. That runs ~7.4s using builtin vcruntime140 here and ~2s using native vcruntime140 under Wine (compiled as x86_64-w64-mingw32-gcc ./memcpyperf.c -o memcpyperf).

On 8/14/20 11:27, piotr@codeweavers.com wrote:

...

Hi Fabian,

I'll be back from vacation on Monday (currently I have very limited internet access). I'll look on it then.

I'm not sure how complicated the assembly implementation is but I'm expecting that a separated assembly file will not be needed. Also, AFAIK, we can't take the implementation from glibc. It would be also useful to know how efficient Microsoft implementation is.

Musl also have platform specific implementation of memove (for i386 and x64) written is assembly. I bet it should be good enough for Wine.

Thanks, Piotr

On Aug 12, 2020 23:33, Fabian Maurer dark.shadow4@web.de wrote:
Hello,

since msvcrt isn't relying on the standard library memmove/memcpy
anymore,
there's been a pretty bad performance regression. See
https://bugs.winehq.org/
show_bug.cgi?id=49663.

For the best performance, and since those memory operations are
pretty common,
we'd presumably like to optimize them as much as possible. You
might have seen
my patch for an implementation from musl, although Zebediah
rightfully pointed
out we might want to opt for the best performance we can get...
glibc currently offers the best performance, thanks to SSE/AVX
implementations
and runtime selection of the best supported path.

First, would you have any objections adding specialized paths
written in
assembly for x86?
And if we were to add them, would we link against assembly files,
or someway
transform them into inline assembly? AFAIK, Wine didn't come with
pure
assembly files yet...

If you want, I could set up a few crude benchmarks to see how
different
versions compare.

Regards,
Fabian Maurer

Zebediah Figura

2:18 p.m.

On 8/14/20 3:27 AM, piotr@codeweavers.com wrote:

...

Hi Fabian,

I'll be back from vacation on Monday (currently I have very limited internet access). I'll look on it then.

I'm not sure how complicated the assembly implementation is but I'm expecting that a separated assembly file will not be needed. Also, AFAIK, we can't take the implementation from glibc. It would be also useful to know how efficient Microsoft implementation is.

I believe you are correct. I misread their licensing files and thought they used LGPL 2, not GPL 2.

...

Musl also have platform specific implementation of memove (for i386 and x64) written is assembly. I bet it should be good enough for Wine.

Thanks, Piotr

On Aug 12, 2020 23:33, Fabian Maurer dark.shadow4@web.de wrote:

Hello,

since msvcrt isn't relying on the standard library memmove/memcpy
anymore,
there's been a pretty bad performance regression. See
https://bugs.winehq.org/
show_bug.cgi?id=49663.

For the best performance, and since those memory operations are
pretty common,
we'd presumably like to optimize them as much as possible. You might
have seen
my patch for an implementation from musl, although Zebediah
rightfully pointed
out we might want to opt for the best performance we can get...
glibc currently offers the best performance, thanks to SSE/AVX
implementations
and runtime selection of the best supported path.

First, would you have any objections adding specialized paths
written in
assembly for x86?
And if we were to add them, would we link against assembly files, or
someway
transform them into inline assembly? AFAIK, Wine didn't come with pure
assembly files yet...

If you want, I could set up a few crude benchmarks to see how different
versions compare.

Regards,
Fabian Maurer

Zebediah Figura

4:07 p.m.

On 8/14/20 9:18 AM, Zebediah Figura wrote:

...

On 8/14/20 3:27 AM, piotr@codeweavers.com wrote:

...
Hi Fabian,

I'll be back from vacation on Monday (currently I have very limited internet access). I'll look on it then.

I'm not sure how complicated the assembly implementation is but I'm expecting that a separated assembly file will not be needed. Also, AFAIK, we can't take the implementation from glibc. It would be also useful to know how efficient Microsoft implementation is.

I believe you are correct. I misread their licensing files and thought they used LGPL 2, not GPL 2.

As Henri points out, I am even more confused than that. The project uses several licenses, so one must truly read the header of the file in question.

All of the x86_64 implementations of memmove(), at least, seem to be under LGPL 2.1. Unless there's another, unrelated reason why we can't use them?

...

Musl also have platform specific implementation of memove (for i386 and x64) written is assembly. I bet it should be good enough for Wine.

Thanks, Piotr

On Aug 12, 2020 23:33, Fabian Maurer dark.shadow4@web.de wrote:

Hello,

since msvcrt isn't relying on the standard library memmove/memcpy
anymore,
there's been a pretty bad performance regression. See
https://bugs.winehq.org/
show_bug.cgi?id=49663.

For the best performance, and since those memory operations are
pretty common,
we'd presumably like to optimize them as much as possible. You might
have seen
my patch for an implementation from musl, although Zebediah
rightfully pointed
out we might want to opt for the best performance we can get...
glibc currently offers the best performance, thanks to SSE/AVX
implementations
and runtime selection of the best supported path.

First, would you have any objections adding specialized paths
written in
assembly for x86?
And if we were to add them, would we link against assembly files, or
someway
transform them into inline assembly? AFAIK, Wine didn't come with pure
assembly files yet...

If you want, I could set up a few crude benchmarks to see how different
versions compare.

Regards,
Fabian Maurer

Martin Storsjö

7:29 p.m.

On Fri, 14 Aug 2020, piotr@codeweavers.com wrote:

...

I'm not sure how complicated the assembly implementation is but I'm expecting that a separated assembly file will not be needed. Also, AFAIK, we can't take the implementation from glibc. It would be also useful to know how efficient Microsoft implementation is.

Musl also have platform specific implementation of memove (for i386 and x64) written is assembly. I bet it should be good enough for Wine.

FWIW, I happened to try to do some sort of benchmark of various memcpy implementations (for a different usecase in mingw-w64) recently, see https://sourceforge.net/p/mingw-w64/mailman/message/37030146/ for the measurements and a link to the tool I used for testing.

My conclusion there was that the musl x86_64 assembly implementation looks really good, and the musl C implementation also behaved pretty well, if compiled with GCC. The musl C implementation if compiled with clang was rather slow though.

// Martin

1784

Age (days ago)

1786

Last active (days ago)

wine-devel@winehq.org

5 comments

5 participants

tags (0)

participants (5)

Fabian Maurer
Martin Storsjö
Paul Gofman
piotr＠codeweavers.com
Zebediah Figura