"Greg Turner" gmturner007@ameritech.net wrote:
+int _ismbcalpha(unsigned int ch) +{
- if (ch < 0x100)
- return ((0x41 <= ch && ch <= 0x5a) ||
(0x61 <= ch && ch <= 0x7a) ||
/* Japanese/Katakana, CP 932 */
(0xa6 <= ch && ch <= 0xdf));
- else
- {
FIXME("Handle MBC chars\n");
return 0;
- }
+}
It's better to create a correct multibyte character pair and call GetStringType then.
On Tuesday 12 November 2002 06:43 pm, Dmitry Timoshkov wrote:
"Greg Turner" gmturner007@ameritech.net wrote:
+int _ismbcalpha(unsigned int ch) +{
- if (ch < 0x100)
- return ((0x41 <= ch && ch <= 0x5a) ||
(0x61 <= ch && ch <= 0x7a) ||
/* Japanese/Katakana, CP 932 */
(0xa6 <= ch && ch <= 0xdf));
- else
- {
FIXME("Handle MBC chars\n");
return 0;
- }
+}
It's better to create a correct multibyte character pair and call GetStringType then.
you mean this beast?
BOOL GetStringTypeW( DWORD dwInfoType, // information-type options LPCWSTR lpSrcStr, // source string int cchSrc, // number of characters in string LPWORD lpCharType // output buffer );
You are probably far better informed on these issues than I, but this seems slightly excessive to me -- or, perhaps I'm suffering from sour grapes syndrome, since I'm somewhat out of my element here (I seem to be saying that a lot lately don't I? When, I wonder, will I start to feel that I am /in/ my element?? (Probably not until wine is written in Kylix ... man, that's embarassing!!))
My patch was intended to be a clone of the similar implementations I saw in the same unit, not a full implementation.
But... since I seem to be volunteering to implement the function, I guess I should do it right...
So, on second thought, merged or no, I'll try to whip up a better version of _ismbc* and maybe, if I'm feeling especially cool, some others in the vicinity, using GetStringType as you reccomend.
thanks for your advice,
"Greg Turner" gmturner007@ameritech.net wrote:
It's better to create a correct multibyte character pair and call GetStringType then.
you mean this beast?
BOOL GetStringTypeW( DWORD dwInfoType, // information-type options LPCWSTR lpSrcStr, // source string int cchSrc, // number of characters in string LPWORD lpCharType // output buffer );
You are probably far better informed on these issues than I, but this seems slightly excessive to me -- or, perhaps I'm suffering from sour grapes syndrome, since I'm somewhat out of my element here (I seem to be saying that a lot lately don't I? When, I wonder, will I start to feel that I am /in/ my element?? (Probably not until wine is written in Kylix ... man, that's embarassing!!))
My patch was intended to be a clone of the similar implementations I saw in the same unit, not a full implementation.
But... since I seem to be volunteering to implement the function, I guess I should do it right...
So, on second thought, merged or no, I'll try to whip up a better version of _ismbc* and maybe, if I'm feeling especially cool, some others in the vicinity, using GetStringType as you reccomend.
try something like this (completely not tested):
int _ismbcalpha(unsigned int ch) { char mbch[2]; WCHAR chW; WORD ctype;
mbch[0] = ch & 0xff; mbch[1] = (ch >> 8) & 0xff; MultiByteToWideChar(CP_ACP, 0, mbch, 2, &chW, 1); GetStringTypeW(CT_CTYPE1, &chW, 1, &ctype); return (ctype & C1_ALPHA) != 0; }
I'm not sure whether mbch[0] and mbch[1] should be actually swapped, in the case if multibyte character was passed in.
On Tuesday 12 November 2002 10:42 pm, Dmitry Timoshkov wrote:
"Greg Turner" gmturner007@ameritech.net wrote:
But... since I seem to be volunteering to implement the function, I guess I should do it right...
try something like this (completely not tested):
int _ismbcalpha(unsigned int ch) { char mbch[2]; WCHAR chW; WORD ctype;
mbch[0] = ch & 0xff; mbch[1] = (ch >> 8) & 0xff; MultiByteToWideChar(CP_ACP, 0, mbch, 2, &chW, 1); GetStringTypeW(CT_CTYPE1, &chW, 1, &ctype); return (ctype & C1_ALPHA) != 0;
}
I'm not sure whether mbch[0] and mbch[1] should be actually swapped, in the case if multibyte character was passed in.
OK, I'll figure out the byte-ordering thing for 'ya. Thanks again,
On Tuesday 12 November 2002 10:42 pm, Dmitry Timoshkov wrote:
"Greg Turner" gmturner007@ameritech.net wrote:
So, on second thought, merged or no, I'll try to whip up a better version of _ismbc* and maybe, if I'm feeling especially cool, some others in the vicinity, using GetStringType as you reccomend.
try something like this (completely not tested):
int _ismbcalpha(unsigned int ch) { char mbch[2]; WCHAR chW; WORD ctype;
mbch[0] = ch & 0xff; mbch[1] = (ch >> 8) & 0xff; MultiByteToWideChar(CP_ACP, 0, mbch, 2, &chW, 1); GetStringTypeW(CT_CTYPE1, &chW, 1, &ctype); return (ctype & C1_ALPHA) != 0;
}
I'm not sure whether mbch[0] and mbch[1] should be actually swapped, in the case if multibyte character was passed in.
ok... after much pondering, I think I have this byte-ordering thing figured out. But, it's late, and I'm frazzled, so I'm hoping to get a santiy check on this before I go and implement the wrong thing.
It seems pretty clear from the examples already implemented in this unit that, regardless of the platform endianness, the low order byte will be the trailing byte, and the high-order byte will be the leading byte. MultiByteToWideChar (and _mbtowc, which might be more appropriate here? More on this below...) would seem to expect the bytes in the following order: [leading byte, trailing byte], regardless of the endianness of the platform. So, I think, I should #ifdef the byte-swapping based on the endianness of the target platform (little-endian hosts byte-swap, big-endian hosts don't).... does that sound right?
Another issue that I haven't quite figured out: In this function, I'm supposed to respect the current multibyte code page, as can be get/set by the _{get,set}mbcp functions. The proposed implementation above uses the ANSI codepage, but are those the same thing? To me, it seems like they aren't but I haven't really looked into it, because I've been focusing on the byte ordering issue so far.
/sheesh/ what a mess. so much for international standards making everything easy.... Advice, comments, tips, flames, mailbombs, reporting me to TIPS, etc. are appreciated...
"Greg Turner" gmturner007@ameritech.net wrote:
It seems pretty clear from the examples already implemented in this unit that, regardless of the platform endianness, the low order byte will be the trailing byte, and the high-order byte will be the leading byte. MultiByteToWideChar (and _mbtowc, which might be more appropriate here? More on this below...) would seem to expect the bytes in the following order: [leading byte, trailing byte], regardless of the endianness of the platform. So, I think, I should #ifdef the byte-swapping based on the endianness of the target platform (little-endian hosts byte-swap, big-endian hosts don't).... does that sound right?
I think there is no need for #ifdef's. Something like this should work:
int _ismbcalpha(unsigned int ch) { char mbch[2]; WCHAR chW; WORD ctype; int n_chars;
if (ch < 256) { mbch[0] = ch & 0xff; n_chars = 1; } else /* multibyte character */ { mbch[0] = (ch >> 8) & 0xff; mbch[1] = ch & 0xff; n_chars = 2; } MultiByteToWideChar(CP_ACP, 0, mbch, n_chars, &chW, 1); GetStringTypeW(CT_CTYPE1, &chW, 1, &ctype); return (ctype & C1_ALPHA) != 0; }
Another issue that I haven't quite figured out: In this function, I'm supposed to respect the current multibyte code page, as can be get/set by the _{get,set}mbcp functions. The proposed implementation above uses the ANSI codepage, but are those the same thing? To me, it seems like they aren't but I haven't really looked into it, because I've been focusing on the byte ordering issue so far.
Frankly speaking, I don't know.
On Saturday 16 November 2002 03:52 am, Dmitry Timoshkov wrote:
"Greg Turner" gmturner007@ameritech.net wrote:
So, I think, I should #ifdef the byte-swapping based on the endianness of the target platform (little-endian hosts byte-swap, big-endian hosts don't).... does that sound right?
I think there is no need for #ifdef's. Something like this should work:
if (ch < 256) { mbch[0] = ch & 0xff; n_chars = 1; } else /* multibyte character */ { mbch[0] = (ch >> 8) & 0xff; mbch[1] = ch & 0xff; n_chars = 2; }
Of course, you are right -- that looks great. Now I just have to figure out that codepage thing, and I should be set.
An interesting side note: Microsoft claims that their implementations of these functions are blindingly fast (they said something along the lines that calling these is faster than "if (((0xXX <= ch) && (ch <= 0xYY)) || ((0xAA <= ch) && (ch <=0xBB)))"). I wonder if this means they are using a lookup table in their implementation? Not planning to implement any such thing myself, of course, but I did find it to be an intrigueing statement.
An interesting side note: Microsoft claims that their implementations of these functions are blindingly fast (they said something along the lines that calling these is faster than "if (((0xXX <= ch) && (ch <= 0xYY)) || ((0xAA <= ch) && (ch <=0xBB)))"). I wonder if this means they are using a lookup table in their implementation? Not planning to implement any such thing myself, of course, but I did find it to be an intrigueing statement.
A lookup table is likely to be slower - after all the required entry is unlikely to be in the data cache (except when running a benchmark).
Clearly the comparison if ( (unsigned)(ch - 0xXX) < 0xYY - 0xXX ... is faster than the one quoted - but the compiler is likely to generate that anyway.
David