Re: [PATCH] msvcrt: SSE2 implementation of memcmp for x86_64.

3 Apr 2022


      On Sat, Apr 2, 2022 at 11:09 PM Jin-oh Kang jinoh.kang.kr@gmail.com wrote:
...
It's not a real syscall per se; rather, it's more like a gate between the PE side (corresponding to Windows userspace) and the Unix side (Wine's pseudo kernel space which interacts directly with the host OS). The PE/Unix separation is designed so that every interaction with the system goes to the syscall gate, just like on Windows (we're not there yet, but we'll eventually). This helps satisfy video game anti-cheat technologies and conceal the Unix (.so) code which would otherwise cause confusion for Win32 apps and debuggers tracing the execution path.
Ah. That makes sense. In this case I think Remi is correct that
there's too much overhead.
...
...
I can't speak definitively, because it looks a little different for
every function. But, overwhelmingly, my experience has been that
nothing will run measurably faster than byte-by-byte functions without
using vector instructions. Because the bottleneck isn't CPU power, the
bottleneck is memory access.
It should be.
It's a margin of ~25%, versus a margin of ~500%. Unless you're moving
gigabytes it's unlikely to be noticeable.
That said, another confounding issue is the fact that a large number
of small movements will have very different performance
characteristics from a small number of large movements. It's possible
there are cases where using, say, dwords would be much faster than
trying to vectorize. I haven't found them in testing, but this is
another argument for using someone else's code rather than trying to
roll our own - because a library dedicated to this purpose has likely
done all kinds of profiling to find exactly where that threshold lies.
...
What you're thinking of is a SIMD abstraction library. I don't see how it would be highly necessary, since we're okay with vendor-specific code blocks as long as they are justified. Note that we now only support 4 architectures (IA-32, x86-64, ARM AArch32, and ARM AArch64).
Right. The reason I bring it up is because it would satisfy the
requirement to be portable (as long as you stick to the abstraction
library, you're writing regular C) and would get you close enough to
the performance of real intrinsics that it should leave no need for
inline asm. So if we don't want to import another library, this may be
the best compromise between speed and simplicity.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH] msvcrt: SSE2 implementation of memcmp for x86_64.