Re: WM_GETTEXTLENGTH returns double size

12 Jun 2002

      I am not sure about the specific case, but I do have some experience 
with handling DBCS in general.
When using TCHAR and defining MBCS (which is the default with VCC - MS 
doing something nice for a change) the result (if my memory serves me 
correctly) is an unsigned char. This means that it is the same size as a 
regular char.
The thing to understand when working with MBCS is that a single byte 
does not necessarily mean a single character. You get a stream of bytes, 
some will be 1 byte/character, and some 2.
You are guaranteed against NULL and new line being misrepresented. For 
that reason alone most byte by byte processing will work on MBCS without 
a problem. If you are doing no string processing at all, you can simply 
ignore the MBCS possibility at all.
Things do become messy if you want to either work on a character based 
calculations (i.e. - I have 7 characters in the string, despite it being 
10 bytes long), if you are looking for a particular character ('' is a 
nasty example), or if you want to traverse the string backwards.
Traversing a MBCS string is akin to a forward iterator in STL. You have 
a  macro (isleadbyte, IIRC) that lets you know whether the next byte is 
alone or part of a double byte. You are allowed to save the pointer and 
return to it, but when traversing the string backwards, it is very 
difficult for you to know whether the previous byte is a single 
character or not.
Another problem is that the second byte of an MBCS character may be 
something you will find interesting on its own. Like I said before, one 
nasty example is when parsing a path and looking for '' separators. 
There are some Japanese characters that, when coded in MBCS, result is 
two bytes, the second one being ''. When the proper locale is loaded, 
Windows knows not to treat this '' as a directory separator, but your 
programs may fail to do so (does wine?).
These are the main issues when working with MBCS. I hope I have managed 
to help.
Shachar
Andriy Palamarchuk wrote:
...
This happens in code which unmaps message, mapped from
ASCII to Unicode.
See windows/winproc.c, function
WINPROC_UnmapMsg32ATo32W:
case WM_GETTEXTLENGTH:
   case CB_GETLBTEXTLEN:
   case LB_GETTEXTLEN:
       /* there may be one DBCS char for each Unicode
char */
       return result * 2;
What is the correct way to handle double-byte
characters in this situation?
How Windows handles this?
At least can we return double values when system
metrics SM_DBCSENABLED is true? We could have a switch
in the config file for this system metrics.
I came across this issue when used default combo box
control implementation in Delphi 6.
I assume the same issue also exists for edit controls.
The returned length is correct if I comment out the
code above.
Existing behavior is a possible cause of bug in
entering serial numbers - when
cursor jumps to the next edit field when only half of
text is entered.
Thanks,
Andriy

Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: WM_GETTEXTLENGTH returns double size