On 11/23/20 2:16 PM, Alexandre Julliard wrote:
Rémi Bernon rbernon@codeweavers.com writes:
Signed-off-by: Rémi Bernon rbernon@codeweavers.com
This brings another ~200ms prefix startup time improvement, from ~1.2s to ~1s for "wine cmd /c exit" execution time on average, as well as a ~50ms process startup time improvement, from ~0.25s to ~0.2s execution time when prefix is already started.
The test shows that using wcsicmp is incorrect for face name comparison, or at least that we should not rely on the current locale, and perf also reports a high number of CPU cache miss coming from the locale refcounting, which is the main source of improvement here.
IIUC RtlDowncaseUnicodeChar also does locale dependent case folding, but I'm not sure to see how it's controlled (it's the system default locale that defines the loaded tables right?), and we should probably check if case matching depends on it. If not, is there any canonical normalized unicode case folding that should be used instead?
Unicode case mapping in ntdll is not locale-dependent. You can use CompareStringOrdinal() for that sort of thing.
I also wanted to avoid iterating the whole strings as much as possible. RtlCompareUnicodeStrings (and so CompareStringOrdinal too) needs the actual strings length, or at least one of the two strings length to stop the iteration. This starts to add up as these comparisons are done quite a lot, specially when there's a lot of fonts.
Using CompareStringOrdinal, with strlenW calls to get the actual lengths, instead adds ~50ms to the results above when prefix is not start, and ~15ms to the process execution time when prefix is already started.
Of course the results depends on how many system fonts are installed, and I'm testing it with the debian fonts-noto package which brings a lot, but I don't think it's so unusual (I didn't even install it for the test, for some reason I've got it installed for a while).