I am not sure about the specific case, but I do have some experience with handling DBCS in general.
When using TCHAR and defining MBCS (which is the default with VCC - MS doing something nice for a change) the result (if my memory serves me correctly) is an unsigned char. This means that it is the same size as a regular char.
The thing to understand when working with MBCS is that a single byte does not necessarily mean a single character. You get a stream of bytes, some will be 1 byte/character, and some 2.
You are guaranteed against NULL and new line being misrepresented. For that reason alone most byte by byte processing will work on MBCS without a problem. If you are doing no string processing at all, you can simply ignore the MBCS possibility at all.
Things do become messy if you want to either work on a character based calculations (i.e. - I have 7 characters in the string, despite it being 10 bytes long), if you are looking for a particular character ('' is a nasty example), or if you want to traverse the string backwards.
Traversing a MBCS string is akin to a forward iterator in STL. You have a macro (isleadbyte, IIRC) that lets you know whether the next byte is alone or part of a double byte. You are allowed to save the pointer and return to it, but when traversing the string backwards, it is very difficult for you to know whether the previous byte is a single character or not.
Another problem is that the second byte of an MBCS character may be something you will find interesting on its own. Like I said before, one nasty example is when parsing a path and looking for '' separators. There are some Japanese characters that, when coded in MBCS, result is two bytes, the second one being ''. When the proper locale is loaded, Windows knows not to treat this '' as a directory separator, but your programs may fail to do so (does wine?).
These are the main issues when working with MBCS. I hope I have managed to help.
Shachar
Andriy Palamarchuk wrote:
This happens in code which unmaps message, mapped from ASCII to Unicode. See windows/winproc.c, function WINPROC_UnmapMsg32ATo32W:
case WM_GETTEXTLENGTH: case CB_GETLBTEXTLEN: case LB_GETTEXTLEN: /* there may be one DBCS char for each Unicode char */ return result * 2;
What is the correct way to handle double-byte characters in this situation? How Windows handles this? At least can we return double values when system metrics SM_DBCSENABLED is true? We could have a switch in the config file for this system metrics.
I came across this issue when used default combo box control implementation in Delphi 6. I assume the same issue also exists for edit controls. The returned length is correct if I comment out the code above.
Existing behavior is a possible cause of bug in entering serial numbers - when cursor jumps to the next edit field when only half of text is entered.
Thanks, Andriy
Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com