Re: [PATCH] msvcrt: SSE2 implementation of memcmp for x86_64.

3 Apr 2022


      On 4/3/22 04:35, Elaine Lefler wrote:
...
...
On 4/2/22 13:19, Rémi Bernon wrote:
...
(I personally, believe that the efficient C implementation should come
first, so that any non-supported hardware will at least benefit from it)
I also think that it will be good to add more efficient C implementation
first (it will also show if SSE2 implementation is really needed).
Thanks,
Piotr
I can't speak definitively, because it looks a little different for
every function. But, overwhelmingly, my experience has been that
nothing will run measurably faster than byte-by-byte functions without
using vector instructions. Because the bottleneck isn't CPU power, the
bottleneck is memory access. Like I said, vectors were created
specifically to solve this problem, and IME you won't find notable
performance gains without using them.
Vectorized instructions and intrinsics is just a extension of the idea 
of using larger types to process more data at a time. You can already do 
that to some extend using standard C, and, if you write the code in a 
nice enough way, the compiler may even be able to understand the intent 
and extend it further with vectorized instructions when it believes it's 
useful.
Then it's always a matter of a trade-off between optimizing for the 
large data case vs optimizing for the small data case. The larger the 
building blocks you use, the more you will cripple the small data case, 
as you will need to carefully handle the data alignment and handle the 
border case.
For this specific memcmp case, I believe using larger data types and 
avoiding unnecessary branches, you can already improve the C code well 
enough.
Note that, especially for the functions which are supposed to stop their 
iteration early, you also need to consider whether buffers are always 
entirely valid and if you are allowed to larger chunks of data at a 
time. It seems to be the case for memcmp, but not for memchr for 
instance. [1]
[1] 
https://trust-in-soft.com/blog/2015/12/21/memcmp-requires-pointers-to-fully-...
...
Personally I think Jinoh's suggestion to find a compatible-licensed
library and copy their code is best. Otherwise I sense this will
become an endless circle of "do we really need it?" (yes, but this
type of code is annoying to review) and Wine could benefit from using
an implementation that's already widely-tested.
I personally don't like the idea at all. Copying from other lib code is 
just the best way to get code with no history and which no-one really 
understands the characteristics and the reasons behind it.
Like I said in another thread, the memcpy C code that's been adapted 
from glibc to msvcrt is IMHO a good example. It may very well be 
correct, but looking at it I'm simply unable to say that it is.
Maybe I'm unable to read code, but my first and only impression is that 
it's unnecessarily complex. I don't know why it is the way it is, 
probably for some obscure historical or specific target architecture 
optimization, and, if for some reason we need to optimize it further I 
would just be unable to without rewriting it entirely.
Cheers,
-- 
Rémi Bernon rbernon@codeweavers.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH] msvcrt: SSE2 implementation of memcmp for x86_64.