Hallo,
looking at LCMapStringW, I think we need some table like the LCM_Unicode_LUT[] table. However - I don't understand where the values come from. Odd values seem to be a collation of flags, even values to be some character weight and LCM_Diacritic_LUT[] is some weight for the diacritic. - Do the tables in ../wine/unicode somehow contain enough information to generate these tables?
Help and pointers appreciated.
Bye
"Uwe Bonnes" bon@elektron.ikp.physik.tu-darmstadt.de wrote:
looking at LCMapStringW, I think we need some table like the LCM_Unicode_LUT[] table. However
- I don't understand where the values come from. Odd values seem to be a collation of flags, even values to be some character weight and LCM_Diacritic_LUT[] is some weight for the diacritic.
- Do the tables in ../wine/unicode somehow contain enough information to generate these tables?
That was a TODO in my list of things for a long time. Unfortunately my time is very limited now. I believe that unicode should resolve all issues regarding LCMapStringA/W functionality. If you will not find enough time to reimplement LCMapStringA via LCMapStringW don't worry too much: eventually I'll do it myself.
Thanks.
"Dmitry" == Dmitry Timoshkov dmitry@baikal.ru writes:
Dmitry> "Uwe Bonnes" bon@elektron.ikp.physik.tu-darmstadt.de wrote: >> looking at LCMapStringW, I think we need some table like the >> LCM_Unicode_LUT[] table. However - I don't understand where the >> values come from. Odd values seem to be a collation of flags, even >> values to be some character weight and LCM_Diacritic_LUT[] is some >> weight for the diacritic. - Do the tables in ../wine/unicode somehow >> contain enough information to generate these tables?
Dmitry> That was a TODO in my list of things for a long Dmitry> time. Unfortunately my time is very limited now. I believe that Dmitry> unicode should resolve all issues regarding LCMapStringA/W Dmitry> functionality. If you will not find enough time to reimplement Dmitry> LCMapStringA via LCMapStringW don't worry too much: eventually Dmitry> I'll do it myself.
Even as I have some time the next days, without some hints about my question, I see no starting point yet...
Bye
On Sun, 5 May 2002, Uwe Bonnes wrote:
looking at LCMapStringW, I think we need some table like the LCM_Unicode_LUT[] table. However
- I don't understand where the values come from. Odd values seem to be a collation of flags, even values to be some character weight and LCM_Diacritic_LUT[] is some weight for the diacritic.
Pretty much. The first value in the Unicode_LUT pairs seems to be what I've previously identified (in my reverse engineering of cp_xxx.nls files) as the sort class, the second as the sort weight, and the Diacritic_LUT is the diacritic weight. (Case weight also exist; it is not in those tables, but the case weight is pretty much isupper(x) ? 18 : 2, so no table is used there.)
Sort classes I've identified before: 2 = decomposed sort (e.g. "ß" is sorted as "ss", "þ" is sorted as "th") (sort weight is used as index into decomposition table in cp_xxx.nls) 6 = control characters, hyphens (stuff that's ignored if SORT_STRINGSORT is not specified) 7 = separators 8 = math symbols 10 = symbols 12 = numbers 14 = letters
All weights and classes start on 2 simply because they're used in sort keys generated by LCMapString, which is a string where 0 is the null-terminator and 1 is the field-separator.
- Do the tables in ../wine/unicode somehow contain enough information to generate these tables?
The UnicodeData.txt you can get from ftp.unicode.org contains data that you can use for the sort class, case weight, and maybe diacritic weight, but not sort weight, since that's locale-dependent; you need a sort table for each locale. (I think Windows deals with it by having a big table of default sort weights, then each locale has a table of "exceptions" that's patched into the big table at run-time...)
Unfortunately, I'm not aware of a source for such sort weight data.