Re: [PATCH] msvcrt: Faster memcmp().

19 Apr 2022


      On 4/19/22 10:39, Rémi Bernon wrote:
...
On 4/13/22 15:50, Piotr Caban wrote:
...
Hi Jan,
On 4/6/22 19:14, Jan Sikorski wrote:
...
-/*********************************************************************

*                  memcmp (MSVCRT.@)
*/

-int __cdecl memcmp(const void *ptr1, const void *ptr2, size_t n)
+static inline int memcmp_unaligned(const void *ptr1, const void 
*ptr2, size_t n)
  {
      const unsigned char *p1, *p2;
@@ -2690,6 +2687,64 @@ int __cdecl memcmp(const void *ptr1, const 
void *ptr2, size_t n)
      return 0;
  }
I think it would be good to optimize memcmp_unaligned a little. I'm 
thinking about something along these lines (untested):
static int memcmp_size_t(size_t s1, size_t s2)
{
     const uint8_t *p1 = (const uint8_t*)&s1, *p2 = (const uint8_t*)&s2;
     while (*p1 == *p2)
     {
         p1++;
         p2++;
     }
     return *p1 > *p2 ? 1 : -1;
}
You can make the small case branchless (and slightly faster) by 
converting to big endian then comparing the value at once.
...
static int memcmp_unaligned(const char *c1, const char *c2, size_t len)
{
     int sh1 = 8 * ((size_t)c2 % sizeof(size_t));
     int sh2 = 8 * sizeof(size_t) - sh1;
     const size_t *s1 = (const size_t*)c1;
     const size_t *s2 = (const size_t*)(c2 - sh1 / 8);
     size_t x, y, m;
x = s2[0];
     do
     {
         y = s2[1];
         m = MERGE(x, sh1, y, sh2);
         if (*s1 != m)
             return memcmp_size_t(*s1, m);
         s1++;
         s2++;
         len--;
         x = y;
     } while (len);
     return 0;
}
Where MERGE is already defined in string.c file, len is length in 
sizeof(size_t) blocks instead of bytes. It may be even better to 
switch to uint64_t instead of using size_t like in memset. You can 
also take a look on glibc platform independent implementation (it uses 
the MERGE trick + loop unrolling, according to some random benchmark 
it's on par with your implementation performance wise on i386/x86_64 
and is much faster on arm).
Thanks,
Piotr
FWIW for me and with a basic benchmark this version is 3x slower than 
Jan's version on 32bit x86, 2x slower on 64bit.
To make sure we're talking about the same thing - I've meant that 
glibc's (platform independent) version has similar performance. The code 
I have attached here was only added to show general idea while keeping 
the code as simple as possible.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH] msvcrt: Faster memcmp().