Re: Resend: implement _ismbcalpha, _ismbcalnum

List overview All Threads

newer

older

Wine with .NET?

RE: Stub implementation of...

Dmitry Timoshkov

13 Nov 2002 13 Nov '02

12:43 a.m.

"Greg Turner" gmturner007@ameritech.net wrote:

...

+int _ismbcalpha(unsigned int ch) +{
if (ch < 0x100)

return ((0x41 <= ch && ch <= 0x5a) ||
       (0x61 <= ch && ch <= 0x7a) ||
       /* Japanese/Katakana, CP 932 */
       (0xa6 <= ch && ch <= 0xdf));
else

{
 FIXME("Handle MBC chars\n");
 return 0;
}
+}

It's better to create a correct multibyte character pair and call GetStringType then.

-- Dmitry.

Show replies by date

Greg Turner

13 Nov 13 Nov

3:58 a.m.

New subject: Resend: implement _ismbcalpha, _ismbcalnum

On Tuesday 12 November 2002 06:43 pm, Dmitry Timoshkov wrote:

...

"Greg Turner" gmturner007@ameritech.net wrote:

...
+int _ismbcalpha(unsigned int ch) +{
if (ch < 0x100)

return ((0x41 <= ch && ch <= 0x5a) ||
       (0x61 <= ch && ch <= 0x7a) ||
       /* Japanese/Katakana, CP 932 */
       (0xa6 <= ch && ch <= 0xdf));
else

{
 FIXME("Handle MBC chars\n");
 return 0;
}
+}
It's better to create a correct multibyte character pair and call GetStringType then.

you mean this beast?

BOOL GetStringTypeW( DWORD dwInfoType, // information-type options LPCWSTR lpSrcStr, // source string int cchSrc, // number of characters in string LPWORD lpCharType // output buffer );

You are probably far better informed on these issues than I, but this seems slightly excessive to me -- or, perhaps I'm suffering from sour grapes syndrome, since I'm somewhat out of my element here (I seem to be saying that a lot lately don't I? When, I wonder, will I start to feel that I am /in/ my element?? (Probably not until wine is written in Kylix ... man, that's embarassing!!))

My patch was intended to be a clone of the similar implementations I saw in the same unit, not a full implementation.

But... since I seem to be volunteering to implement the function, I guess I should do it right...

So, on second thought, merged or no, I'll try to whip up a better version of _ismbc* and maybe, if I'm feeling especially cool, some others in the vicinity, using GetStringType as you reccomend.

thanks for your advice,

-- gmt "War is an ugly thing, but not the ugliest of things; the decayed and degraded state of moral and patriotic feeling which thinks that nothing is worth war is much worse. A man who has nothing for which he is willing to fight; nothing he cares about more than his own personal safety; is a miserable creature who has no chance of being free, unless made and kept so by the exertions of better persons than himself." -- John Stuart Mill

Dmitry Timoshkov

4:42 a.m.

New subject: Resend: implement _ismbcalpha, _ismbcalnum

"Greg Turner" gmturner007@ameritech.net wrote:

...

...
It's better to create a correct multibyte character pair and call GetStringType then.

you mean this beast?

BOOL GetStringTypeW( DWORD dwInfoType, // information-type options LPCWSTR lpSrcStr, // source string int cchSrc, // number of characters in string LPWORD lpCharType // output buffer );

You are probably far better informed on these issues than I, but this seems slightly excessive to me -- or, perhaps I'm suffering from sour grapes syndrome, since I'm somewhat out of my element here (I seem to be saying that a lot lately don't I? When, I wonder, will I start to feel that I am /in/ my element?? (Probably not until wine is written in Kylix ... man, that's embarassing!!))

My patch was intended to be a clone of the similar implementations I saw in the same unit, not a full implementation.

But... since I seem to be volunteering to implement the function, I guess I should do it right...

So, on second thought, merged or no, I'll try to whip up a better version of _ismbc* and maybe, if I'm feeling especially cool, some others in the vicinity, using GetStringType as you reccomend.

try something like this (completely not tested):

int _ismbcalpha(unsigned int ch) { char mbch[2]; WCHAR chW; WORD ctype;

mbch[0] = ch & 0xff; mbch[1] = (ch >> 8) & 0xff; MultiByteToWideChar(CP_ACP, 0, mbch, 2, &chW, 1); GetStringTypeW(CT_CTYPE1, &chW, 1, &ctype); return (ctype & C1_ALPHA) != 0; }

I'm not sure whether mbch[0] and mbch[1] should be actually swapped, in the case if multibyte character was passed in.

-- Dmitry.

Greg Turner

4:55 a.m.

New subject: Resend: implement _ismbcalpha, _ismbcalnum

On Tuesday 12 November 2002 10:42 pm, Dmitry Timoshkov wrote:

...

"Greg Turner" gmturner007@ameritech.net wrote:

...
But... since I seem to be volunteering to implement the function, I guess I should do it right...

try something like this (completely not tested):

int _ismbcalpha(unsigned int ch) { char mbch[2]; WCHAR chW; WORD ctype;
mbch[0] = ch & 0xff;
mbch[1] = (ch >> 8) & 0xff;
MultiByteToWideChar(CP_ACP, 0, mbch, 2, &chW, 1);
GetStringTypeW(CT_CTYPE1, &chW, 1, &ctype);
return (ctype & C1_ALPHA) != 0;
}

I'm not sure whether mbch[0] and mbch[1] should be actually swapped, in the case if multibyte character was passed in.

OK, I'll figure out the byte-ordering thing for 'ya. Thanks again,

Greg Turner

16 Nov 16 Nov

8:41 a.m.

New subject: Resend: implement _ismbcalpha, _ismbcalnum

On Tuesday 12 November 2002 10:42 pm, Dmitry Timoshkov wrote:

...

"Greg Turner" gmturner007@ameritech.net wrote:

...
So, on second thought, merged or no, I'll try to whip up a better version of _ismbc* and maybe, if I'm feeling especially cool, some others in the vicinity, using GetStringType as you reccomend.

try something like this (completely not tested):

int _ismbcalpha(unsigned int ch) { char mbch[2]; WCHAR chW; WORD ctype;
mbch[0] = ch & 0xff;
mbch[1] = (ch >> 8) & 0xff;
MultiByteToWideChar(CP_ACP, 0, mbch, 2, &chW, 1);
GetStringTypeW(CT_CTYPE1, &chW, 1, &ctype);
return (ctype & C1_ALPHA) != 0;
}

I'm not sure whether mbch[0] and mbch[1] should be actually swapped, in the case if multibyte character was passed in.

ok... after much pondering, I think I have this byte-ordering thing figured out. But, it's late, and I'm frazzled, so I'm hoping to get a santiy check on this before I go and implement the wrong thing.

It seems pretty clear from the examples already implemented in this unit that, regardless of the platform endianness, the low order byte will be the trailing byte, and the high-order byte will be the leading byte. MultiByteToWideChar (and _mbtowc, which might be more appropriate here? More on this below...) would seem to expect the bytes in the following order: [leading byte, trailing byte], regardless of the endianness of the platform. So, I think, I should #ifdef the byte-swapping based on the endianness of the target platform (little-endian hosts byte-swap, big-endian hosts don't).... does that sound right?

Another issue that I haven't quite figured out: In this function, I'm supposed to respect the current multibyte code page, as can be get/set by the _{get,set}mbcp functions. The proposed implementation above uses the ANSI codepage, but are those the same thing? To me, it seems like they aren't but I haven't really looked into it, because I've been focusing on the byte ordering issue so far.

/sheesh/ what a mess. so much for international standards making everything easy.... Advice, comments, tips, flames, mailbombs, reporting me to TIPS, etc. are appreciated...

Dmitry Timoshkov

9:52 a.m.

New subject: Resend: implement _ismbcalpha, _ismbcalnum

"Greg Turner" gmturner007@ameritech.net wrote:

...

It seems pretty clear from the examples already implemented in this unit that, regardless of the platform endianness, the low order byte will be the trailing byte, and the high-order byte will be the leading byte. MultiByteToWideChar (and _mbtowc, which might be more appropriate here? More on this below...) would seem to expect the bytes in the following order: [leading byte, trailing byte], regardless of the endianness of the platform. So, I think, I should #ifdef the byte-swapping based on the endianness of the target platform (little-endian hosts byte-swap, big-endian hosts don't).... does that sound right?

I think there is no need for #ifdef's. Something like this should work:

int _ismbcalpha(unsigned int ch) { char mbch[2]; WCHAR chW; WORD ctype; int n_chars;

if (ch < 256) { mbch[0] = ch & 0xff; n_chars = 1; } else /* multibyte character */ { mbch[0] = (ch >> 8) & 0xff; mbch[1] = ch & 0xff; n_chars = 2; } MultiByteToWideChar(CP_ACP, 0, mbch, n_chars, &chW, 1); GetStringTypeW(CT_CTYPE1, &chW, 1, &ctype); return (ctype & C1_ALPHA) != 0; }

...

Another issue that I haven't quite figured out: In this function, I'm supposed to respect the current multibyte code page, as can be get/set by the _{get,set}mbcp functions. The proposed implementation above uses the ANSI codepage, but are those the same thing? To me, it seems like they aren't but I haven't really looked into it, because I've been focusing on the byte ordering issue so far.

Frankly speaking, I don't know.

-- Dmitry.

Greg Turner

3:21 p.m.

New subject: Resend: implement _ismbcalpha, _ismbcalnum

On Saturday 16 November 2002 03:52 am, Dmitry Timoshkov wrote:

...

"Greg Turner" gmturner007@ameritech.net wrote:

...
So, I think, I should #ifdef the byte-swapping based on the endianness of the target platform (little-endian hosts byte-swap, big-endian hosts don't).... does that sound right?

I think there is no need for #ifdef's. Something like this should work:
if (ch < 256)
{
    mbch[0] = ch & 0xff;
    n_chars = 1;
}
else /* multibyte character */
{
    mbch[0] = (ch >> 8) & 0xff;
    mbch[1] = ch & 0xff;
    n_chars = 2;
}

Of course, you are right -- that looks great. Now I just have to figure out that codepage thing, and I should be set.

An interesting side note: Microsoft claims that their implementations of these functions are blindingly fast (they said something along the lines that calling these is faster than "if (((0xXX <= ch) && (ch <= 0xYY)) || ((0xAA <= ch) && (ch <=0xBB)))"). I wonder if this means they are using a lookup table in their implementation? Not planning to implement any such thing myself, of course, but I did find it to be an intrigueing statement.

David Laight

4:20 p.m.

New subject: Resend: implement _ismbcalpha, _ismbcalnum

...

An interesting side note: Microsoft claims that their implementations of these functions are blindingly fast (they said something along the lines that calling these is faster than "if (((0xXX <= ch) && (ch <= 0xYY)) || ((0xAA <= ch) && (ch <=0xBB)))"). I wonder if this means they are using a lookup table in their implementation? Not planning to implement any such thing myself, of course, but I did find it to be an intrigueing statement.

A lookup table is likely to be slower - after all the required entry is unlikely to be in the data cache (except when running a benchmark).

Clearly the comparison if ( (unsigned)(ch - 0xXX) < 0xYY - 0xXX ... is faster than the one quoted - but the compiler is likely to generate that anyway.

David

-- David Laight: david@l8s.co.uk

8268

Age (days ago)

8271

Last active (days ago)

wine-devel@winehq.org

7 comments

3 participants

tags (0)

participants (3)

David Laight
Dmitry Timoshkov
Greg Turner