Rolf Kalbermatter wrote:
Hello,
In trying to get shell32 a little bit more Unicodified I came across this function ParseFieldA which is taken from shellord.c. I'm quite unfamiliar with Unicode so I still have to learn a lot.
I have finally found most of the string manipulation functions which work for Unicode but when it comes down to simple character comparison I'm a little bit in the dark here.
Some code snippets elsewhere in wine make me believe that for the english charset WCHAR == char is actually mostly true. However I wonder if this can be relied on in code. For instance the Unicode version of ParseField would in that case look like this but I really want the opinion of someone else on, if the code
if (*src++ == ',') nField--;
is actually working as expected on all systems independent of the actually used charsets for the local languages.
It's ok to compare a WCHAR with a known char ('A'), but not two WCHARS together.
Explanation - We (as well as Windows) use UTF-16 (UCS-2?) to represent characters. Most common Unicode characters in Europe, Africa, America, Australia and the middle east fit nicely into this area, and there are no problems. Eastern Asia, and some other characters, however, don't.
The characters that don't fit in are represented using Surrogates - i.e. - each character takes two WCHARS to represent. The Unicode standard has been very wise in selecting the surrogates, however. Both first and second WCHARs of any given surrogate are taken from a range that is not allocated for any other character of Unicode. This means that if you are looking for a Hebrew "Aleph", scanning with a piece of code that looks something like: while (*str++ != 0x5d0) is guaranteed not to match anything except "Aleph". This means that if it's a specific character you are looking for, and you know it's not a surrogate, your code will work.
However! If you are trying to look for an occurance of one character inside a string, and neither string nor character are known to you at the time of writing the code, this technique may fail miserably. The reason is that if the character you are looking for is a surrogate, both first and second WCHARs may appear, seperately, in other chars (all surrogates themselves, but still).
Bear that in mind, and everything will be ok.
Shachar