Hello,
On 7/25/17 4:33 PM, Artur Świgoń wrote:
Dear All,
My name is Artur and I'm participating in Google Summer of Code 2017 for Wine. Under Nikolay's supervision, I'm working on implementation of Unicode normalization. I probably should have introduced myself some time ago to share results of my research and my ideas, but I also wanted to wait until I could illustrate my points with some code.
Very cool! This is a problem I ran into with Japanese unicode string comparisons a while ago so it is great it will be addressed! Then we will have to investigate the CompareStringW, and family, behavior.
- Mappings for characters above 0xFFFF are encoded as UTF-16 (using surrogate pairs), but a single codepoint (UTF-32 if you like) is used for table indexing. Setting $utflim in make_unicode to 65536 is the simplest way to disable support for such characters, but supporting surrogate pairs should not affect any text-related Wine component in a negative way.
There is some super basic work on non-BMP unicode glyphs and surrogate pairs in Uniscribe (usp10). I wrote a quick decode_surrogate_pair() function to help get a DWORD unicode value for the surrogate pair. So you can look at that if you are interested!
Thanks! -aric