On 12/03/18 06:03, Nikolay Sivov wrote:

On 3/12/2018 12:25 PM, Huw Davies wrote:

        
+                           LPARAM sort_handle)
+{
+
+    DWORD mask = flags;
+
+    TRACE("%s %x %s %d %s %d %p %p %p %ld\n", wine_dbgstr_w(localename), flags,
+          wine_dbgstr_w(src), src_size, wine_dbgstr_w(value), value_size, found,
+          version_info, reserved, sort_handle);
+    FIXME("strings should be normalized once NormalizeString() is implemented\n");
I don't think we want the noise that this FIXME would generate.  Just add a comment.
Actually it might be possible that CompareString() handles decomposed
case on its own, I haven't tested that.


Yeah, you are right Nikolai; I just tested on Windows and it seems that CompareString() shares the same comparison semantics with FindNLSStringEx(). On Wine it fails, however, so I guess I'd code FindNLSStringEx() assuming a working CompareString(), and then see what is missing there.
I actually had it like this in my first patch, relying on CompareString (assuming the shared semantics). I wanted to normalize first in this v2 patch so that the substring search would be worst case o(n) instead of o(n.m). However, reading the Unicode standard, it seems that I can make some assumptions about the maximum expansion factor in decomposition (when assuming canonical decomposition).

"There is also a Unicode Consortium stability policy that canonical mappings are always limited in all versions of Unicode, so that no string when decomposed with NFC expands to more than 3× in length (measured in code units). This is true whether the text is in UTF-8, UTF-16, or UTF-32. This guarantee also allows for certain optimizations in processing, especially in determining buffer sizes"

Although it seems that the worst case possible is a 18x expansion factor when using normalization form NFKC, it looks like these functions only test for canonical equivalence, so I guess it would be ok to assume a worst case of 3x for the length to keep things o(n).

Does this sound right to you?