On Fri, 3 Oct 2003 06:00, Shachar Shemesh wrote:
Dmitry Timoshkov wrote:
Exactly. I have something like that here, the only difference is that I'm dumping full unicode range 0-0xffff, not only first 96 characters.
Isn't the full unicode range significantly larger than 0-0xffff? What about agregates? CJK etc?
The full unicode range (UCS4) is represented by a 32 bit number. Windows uses UTF-16 (not UCS2 as the documentation I think suggests), in which characters in the range dc00-dfff are used in two word sequences to represent the UCS4 characters 0x10000 to 0x10ffff. Thus to deal with the full range of characters Windows can theoretically represent you'd have to have a table with 0x110000-0x400 = 0x10fc00 entries.