Regarding memcpy performance, I also recently came through suboptimal memcpy / memmove performance while doing perf analysis of Shadow of The Tomb Rider game. While in that case I did not find memcpy to be responsible for any sufficient slow down (maybe ~2-3 fps as maximum together with math functions implementation), it brought attention by consistently appearing in perf top and taking some measurable CPU time estimated otherwise.

I am attaching a very short test program. That runs ~7.4s using builtin vcruntime140 here and ~2s using native vcruntime140 under Wine (compiled as x86_64-w64-mingw32-gcc ./memcpyperf.c -o memcpyperf).

On 8/14/20 11:27, piotr@codeweavers.com wrote:
Hi Fabian,

I'll be back from vacation on Monday (currently I have very limited internet access). I'll look on it then.

I'm not sure how complicated the assembly implementation is but I'm expecting that a separated assembly file will not be needed. Also, AFAIK, we can't take the implementation from glibc. It would be also useful to know how efficient Microsoft implementation is.

Musl also have platform specific implementation of memove (for i386 and x64) written is assembly. I bet it should be good enough for Wine.

Thanks,
Piotr

On Aug 12, 2020 23:33, Fabian Maurer <dark.shadow4@web.de> wrote:

Hello,

since msvcrt isn't relying on the standard library memmove/memcpy anymore,
there's been a pretty bad performance regression. See https://bugs.winehq.org/
show_bug.cgi?id=49663.

For the best performance, and since those memory operations are pretty common,
we'd presumably like to optimize them as much as possible. You might have seen
my patch for an implementation from musl, although Zebediah rightfully pointed
out we might want to opt for the best performance we can get...
glibc currently offers the best performance, thanks to SSE/AVX implementations
and runtime selection of the best supported path.

First, would you have any objections adding specialized paths written in
assembly for x86?
And if we were to add them, would we link against assembly files, or someway
transform them into inline assembly? AFAIK, Wine didn't come with pure
assembly files yet...

If you want, I could set up a few crude benchmarks to see how different
versions compare.

Regards,
Fabian Maurer