Re: [PATCH] msvcrt: Import memmove from musl

26 Aug 2020


      On 26/08/2020 17:01, Gabriel Ivăncescu wrote:
...
On 25/08/2020 20:15, Piotr Caban wrote:
...
On 8/22/20 5:10 PM, Gabriel Ivăncescu wrote:
...
I understand `rep movsl` is faster even in the first test than `rep 
movsb`?
No, it was faster in "Non-aligned", "Aligned overlap" and "Non-aligned 
overlap" tests. In the "Aligned" case the performance was identical no 
matter if movsb or movsl was used.
I'm also attaching simple sse2 implementation for comparison. It's 
faster than the previous one on my machine. I'm also attaching results 
from running the test on Windows (in VM).
Thanks,
Piotr
In most cases, the SSE version performs very well, in fact slightly 
better than the Windows implementation, and does very well for small moves.
Unfortunately, for some reason, it seems it's quite significantly slower 
(20% or more) only on the "non-overlapped" case. Attached results.
Thanks,
Gabriel
Also, sorry I forgot to mention a small thing, is there a reason you're 
using movdq(a|u) instead of movaps/movups (which are also SSE1 not 
SSE2)? They have smaller encoding and should very slightly help with the 
instruction cache, and no CPU cares about floating vs int states when 
doing only moves. (even if it did, most operations on SSE tend to be for 
floats anyway, assuming some broken CPU has some false dependency on 
them, but I doubt it)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH] msvcrt: Import memmove from musl