Hello,
You introduce a lot of complexity into the low-level helper routines. We don't want to put Win32 quirks, like -1 meaning "until the null terminator" being put into the low-level functions unless we have to. In this case, if this is the sole cause of the slowdown then you would be better off caching the length at the MSI level since the strings are bound to be accessed more than once.
My intention was to fix performance of lstrcmpW, or CompareStringW with null-terminated strings. Currently if we try to compare "1st very long string of blah blah blah blah blah" with "2nd very long string of blah blah blah blah blah", we call non-optimized strlenW (that is simple inline loop with null checking) for both of them, which both do all together around 100 iterations. While 1 comparison (iteration) is enough to tell the result should be -1. So counting length for both string in CompareStringW just because low- lever wine_compare_string expects strings to have always specified length is IMHO CPU hog, that makes impact on WINE MSI performance, on other components using a lot of string comparing.
How do you know that the compiler isn't generating the same assembly for the functions as that used by msvcrt / glibc? Theories like these need numbers to back them up.
I know that because those msvcrt or glibc routines are written directly in assembler not C and optimized for the 32-bit (or 64-bit) data aligning etc, and DWORD fetching. Visual Studio is shipping with complete sources of C CRT so you may have a look.
;*** ;strlen.asm - contains strlen() routine ; ; Copyright (c) Microsoft Corporation. All rights reserved. ; ;Purpose: ; strlen returns the length of a null-terminated string, ; not including the null byte itself. ; ;*******************************************************************************
Probably same thing happens with CompareStringW and lstrlenW of kernel, but this time I can't be sure because I don't have the ASM sources.
So it has nothing to do with compiler, but care about performance of some often used functions and programming habit to reuse existing far better optimized low level code than cooking generic one believing that GCC will do make it best.
Cheers,