Adam Strzelecki wrote:
During playing with installing Visual Studio 2005 with WINE I found out that WINE's MSI is spending lot of time inside lstrcmpW. With Mac OS X process sampler I checked that actually 60-70% of CompareStringW is wine_compare_string, rest is rest of function body, which is in case 2 strlenW calls (inline). I believe this isn't necessary, moreover most of compares in MSI are non-matching on first few characters, however with current implementation we go trough all strings with strlenW regardless of anything, which is waste in CPU cucles. I modified both CompareStringW and wine_compare_string so they work with -1 strings without strlenW checking before. This saves in my testing about 20-30% of CPU cycles.
You introduce a lot of complexity into the low-level helper routines. We don't want to put Win32 quirks, like -1 meaning "until the null terminator" being put into the low-level functions unless we have to. In this case, if this is the sole cause of the slowdown then you would be better off caching the length at the MSI level since the strings are bound to be accessed more than once.
However, I think you'll find that this is an algorithm problem in MSI rather than a performance issue in the functions you mention. By that, I mean it is likely that one MSI query is being performed again and again and that the results should be cached.
Note also that WINE's strlenW and lstrcmpW functions are far from perfect. Normal msvcrt or glibc string functions are far more optimized at machine code level. This makes strong impact on WINE parts strongly relaying on string manipulation which is in this case MSI.
How do you know that the compiler isn't generating the same assembly for the functions as that used by msvcrt / glibc? Theories like these need numbers to back them up.
On Fri, Feb 29, 2008 at 11:06 AM, Robert Shearman rob@codeweavers.com wrote:
Adam Strzelecki wrote:
During playing with installing Visual Studio 2005 with WINE I found out that WINE's MSI is spending lot of time inside lstrcmpW. With Mac OS X process sampler I checked that actually 60-70% of CompareStringW is wine_compare_string, rest is rest of function body, which is in case 2 strlenW calls (inline). I believe this isn't necessary, moreover most of compares in MSI are non-matching on first few characters, however with current implementation we go trough all strings with strlenW regardless of anything, which is waste in CPU cucles. I modified both CompareStringW and wine_compare_string so they work with -1 strings without strlenW checking before. This saves in my testing about 20-30% of CPU cycles.
You introduce a lot of complexity into the low-level helper routines. We don't want to put Win32 quirks, like -1 meaning "until the null terminator" being put into the low-level functions unless we have to. In this case, if this is the sole cause of the slowdown then you would be better off caching the length at the MSI level since the strings are bound to be accessed more than once.
However, I think you'll find that this is an algorithm problem in MSI rather than a performance issue in the functions you mention. By that, I mean it is likely that one MSI query is being performed again and again and that the results should be cached.
This is not the case at all. If you use native MSI, you'll see almost exactly the same number of string comparisons (give or take). I've rarely seen the same SQL query used multiple times with the exact same parameters. Caching would add too much complexity, and in normal cases, more processing time than time saved.
Note also that WINE's strlenW and lstrcmpW functions are far from perfect. Normal msvcrt or glibc string functions are far more optimized at machine code level. This makes strong impact on WINE parts strongly relaying on string manipulation which is in this case MSI.
How do you know that the compiler isn't generating the same assembly for the functions as that used by msvcrt / glibc? Theories like these need numbers to back them up.
Hello,
You introduce a lot of complexity into the low-level helper routines. We don't want to put Win32 quirks, like -1 meaning "until the null terminator" being put into the low-level functions unless we have to. In this case, if this is the sole cause of the slowdown then you would be better off caching the length at the MSI level since the strings are bound to be accessed more than once.
My intention was to fix performance of lstrcmpW, or CompareStringW with null-terminated strings. Currently if we try to compare "1st very long string of blah blah blah blah blah" with "2nd very long string of blah blah blah blah blah", we call non-optimized strlenW (that is simple inline loop with null checking) for both of them, which both do all together around 100 iterations. While 1 comparison (iteration) is enough to tell the result should be -1. So counting length for both string in CompareStringW just because low- lever wine_compare_string expects strings to have always specified length is IMHO CPU hog, that makes impact on WINE MSI performance, on other components using a lot of string comparing.
How do you know that the compiler isn't generating the same assembly for the functions as that used by msvcrt / glibc? Theories like these need numbers to back them up.
I know that because those msvcrt or glibc routines are written directly in assembler not C and optimized for the 32-bit (or 64-bit) data aligning etc, and DWORD fetching. Visual Studio is shipping with complete sources of C CRT so you may have a look.
;*** ;strlen.asm - contains strlen() routine ; ; Copyright (c) Microsoft Corporation. All rights reserved. ; ;Purpose: ; strlen returns the length of a null-terminated string, ; not including the null byte itself. ; ;*******************************************************************************
Probably same thing happens with CompareStringW and lstrlenW of kernel, but this time I can't be sure because I don't have the ASM sources.
So it has nothing to do with compiler, but care about performance of some often used functions and programming habit to reuse existing far better optimized low level code than cooking generic one believing that GCC will do make it best.
Cheers,