Attached version of the C testing framework, which is implemented with using TCHAR.H macros, so it is portable between ASCII and Unicode platforms. Also implemented test which can be used to test ASCII and Unicode API.
As I found out creating ASCII/Unicode portable code requires much more attention than creating plain ASCII-only code, probably the same level of difficulty as creation Unicode-only code.
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
"Andriy Palamarchuk" apa3a@yahoo.com wrote:
Attached version of the C testing framework, which is implemented with using TCHAR.H macros, so it is portable between ASCII and Unicode platforms. Also implemented test which can be used to test ASCII and Unicode API.
You probably want to know that in order to use TCHAR.H you need either gcc >= 2.96, or mess with converting 4-byte unicode strings to 2-byte and vice versa. I would prefer to avoid the whole bunch of confusion with all that mess and use straight CHAR and WCHAR types. Yes, it will mean to have separate tests for ANSI and Unicode and will require slightly more work to type WCHAR str[] = {'u','n','i','c','o','d','e',0}; instead of a simple WCHAR str[] = L"unicode";
Please take into account the fact that everyone, who uses TCHAR, actually needs to think twice in any case. TCHAR makes it only harder to understand, what really happens.
--- Dmitry Timoshkov dmitry@baikal.ru wrote:
"Andriy Palamarchuk" apa3a@yahoo.com wrote:
Attached version of the C testing framework, which
is
implemented with using TCHAR.H macros, so it is portable between ASCII and Unicode platforms. Also implemented test which can be used to test ASCII
and
Unicode API.
You probably want to know that in order to use TCHAR.H you need either gcc >= 2.96,
I know this :-( I strongly prefer to use straigt TCHAR programming, without any conversions. This is why I started to work with it in the first place.
There is no such issue for Windows compilers. Can we have requirement of 2-byte Unicode characters compiler for the tests? In the worst case people, who do not have such compiler will be able to run tests in ANSII mode only.
or mess with converting 4-byte unicode strings to 2-byte and vice versa.
Do you mean using -fwritable-strings parameter or performing explicit conversion? Can we hide explicit conversion in the _T/_TEXT macros?
I would prefer to avoid the whole bunch of confusion with all that mess and use straight CHAR and WCHAR types. Yes, it will mean to have separate tests for ANSI and Unicode and will require slightly more work to type WCHAR str[] = {'u','n','i','c','o','d','e',0}; instead of a simple WCHAR str[] = L"unicode";
IMHO this is much bigger mess than with compilers.
Please take into account the fact that everyone, who uses TCHAR, actually needs to think twice in any case. TCHAR makes it only harder to understand, what really happens.
Here I don't agree with you. Programming with TCHAR is *exactly* the same as programming with WCHAR, but with different names. You think once - in WCHAR terms and get as a bonus ANSII version!
In fact, code became clearner after I removed "if (has_unicode()) " blocks, A<->W conversions, letters "A", "W" from API names. I feel that in this case I don't need to "think twice" - once in terms of ANSII, the second time - in terms of Unicode. I moved test completely to using TCHAR (attached). Compare it with the version which uses explicit "W"/"A" calls.
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
Andriy Palamarchuk apa3a@yahoo.com writes:
In the worst case people, who do not have such compiler will be able to run tests in ANSII mode only.
That's clearly not an option.
Here I don't agree with you. Programming with TCHAR is *exactly* the same as programming with WCHAR, but with different names. You think once - in WCHAR terms and get as a bonus ANSII version!
You don't think in WCHAR terms. There isn't a single real Unicode character in your test, it's all purely ASCII. That's the main problem with TCHAR, you have to stick to the lower common denominator, which means no Unicode chars, no multi-byte support, etc.
In fact, code became clearner after I removed "if (has_unicode()) " blocks, A<->W conversions, letters "A", "W" from API names. I feel that in this case I don't need to "think twice" - once in terms of ANSII, the second time - in terms of Unicode.
But we want people to think twice, and write a test adapted to the function they are testing; you don't test ASCII and Unicode the same way, except superficially.
This TCHAR crap is just a marketing tool to let Microsoft pretend that converting to Unicode is easy; but it's not usable in real life (how many apps do you know that ship both an A and a W exe?) We don't use it in Wine, and we must not use it in Wine tests either.
On Tue, 22 Jan 2002, Alexandre Julliard wrote:
But we want people to think twice, and write a test adapted to the function they are testing; you don't test ASCII and Unicode the same way, except superficially.
With all due respect Alexandre, I can't understand your point. When does the _semantics_ of the function differ based on the string encoding??? If it does, it most likely is a bug, simple as that. In fact, one can view the Unicode case as a different encoding, and if every string pointer passed in the Win32 API had an associated encoding with it, we would have never needed the A/W pair of functions.
Point is, the encoding/decoding from/to A/W is orthogonal to the semantics of the function in 99.99% of the cases. As such, we should not fundamentaly mix them in the tests.
-- Dimi.
"Dimitrie O. Paun" dimi@cs.toronto.edu writes:
With all due respect Alexandre, I can't understand your point. When does the _semantics_ of the function differ based on the string encoding???
Functions that take strings usually do something with them, so this is part of the function semantics, and it differs between ASCII and Unicode. A fundamental part of that is making sure that all characters are preserved correctly (no W->A->W round-trip) and it's precisely the thing that will never get tested with the TCHAR stuff.
If all you want is to call the function with some random string, then you don't really need to call both A and W since they usually use the same code anyway. So just call the A function and you will have tested as much (or as little) of the W function that you would have by recompiling the TCHAR code as Unicode, while avoiding the whole macro mess.
--- Alexandre Julliard julliard@winehq.com wrote:
If all you want is to call the function with some random string, then you don't really need to call both A and W since they usually use the same code anyway. So just call the A function and you will have tested as much (or as little) of the W function that you would have by recompiling the TCHAR code as Unicode, while avoiding the whole macro mess.
Alexandre, one time you were saying than W and A functions can't be tested by the same code because they are completely different. Now you are saying that we don't need to test W function because they are no more than conversion to ASCII, calling A function, converting results back to Unicode and it is sufficient to test A function and check the conversions.
Using TCHAR insures that A and W functions have the same behaviour. Why you are against using the same code for this and using encoding-specific code to check the differences?
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
Andriy Palamarchuk apa3a@yahoo.com writes:
Alexandre, one time you were saying than W and A functions can't be tested by the same code because they are completely different. Now you are saying that we don't need to test W function because they are no more than conversion to ASCII, calling A function, converting results back to Unicode and it is sufficient to test A function and check the conversions.
No, I'm not saying this is what we should do, I'm saying this is what you are doing with your TCHAR stuff. If your test does:
foo(_T("abc"))
it essentially does:
fooA("abc") fooW(L"abc")
but in most cases since fooA converts the string and calls fooW already you are not really testing anything more than by simply calling fooA. This doesn't mean that you shouldn't test fooW at all, but the interesting thing to do is to test fooW with Unicode input. So you should do something like:
fooA("abc") fooW(L"some unicode chars here")
Now with your solution:
foo(_T("abc")) #ifdef UNICODE fooW(L"some unicode chars here") #endif
the test is more complex to write, it takes twice as long to compile and run, and it doesn't test more than my version.
And this doesn't even get into testing behavior with mixed A/W calls. To take a random example, to test CreateEvent you'll want to use OpenEvent to verify that the event was created. With TCHAR all you will ever test is that OpenEventA finds an event created by CreateEventA, and OpenEventW finds one created by CreateEventW. You will never test interesting combinations like CreateEventA/OpenEventW because it's a big pain to write.
Using TCHAR insures that A and W functions have the same behaviour. Why you are against using the same code for this and using encoding-specific code to check the differences?
Because it makes writing a small subset of tests slightly easier at the cost of making the really interesting tests much more painful to write. Not a good trade-off.
On Tue, 22 Jan 2002, Alexandre Julliard wrote:
but the interesting thing to do is to test fooW with Unicode input. So you should do something like:
fooA("abc") fooW(L"some unicode chars here")
But notice that this is a problem independent of wether or not we write the tests encoding independent. Even if we go with you version writen above, is still not gonna work because people will not write "interesting" Unicode strings. Period. If I was to write a test now, I have no idea how to write non-pseudo-ASCII Unicode strings anyway.
This is why we _need_ to abstract constant strings out of the tests with something like the teststr() function I was advocating. This is the only way to get things like surrogates, etc. reliably used in tests.
The only hardcoded strings that can reside in tests are set-in-stone paths in the registry (or what have) which we might as well code using the (ugly) _T macro, but these should be just a minority of cases anyway.
-- Dimi.
"Dimitrie O. Paun" dimi@cs.toronto.edu writes:
This is why we _need_ to abstract constant strings out of the tests with something like the teststr() function I was advocating. This is the only way to get things like surrogates, etc. reliably used in tests.
Absolutely; but this has nothing to do with TCHAR at all, we need a convenient way to specify Unicode strings in any case. And it's easier to do if you don't have to mess with TCHAR at the same time, you can simply provide a bunch of Unicode constants.
There was a few objections to have ugly conditionals for Unicode-specific tests, like #ifdef _UNICODE ... unicode-specific tests ... #endif
This problem can be resolved by using instead function or macro as Unicode indicator. Value of this indicator will be defined, basing on presence of _UNICODE macro.
Now we'll have better-looking check, like: if (TEST_UNICODE) { ... unicode-specific tests ... }
Hope now we'll have one less thing to argue about ;-)
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
"Andriy Palamarchuk" apa3a@yahoo.com wrote:
There was a few objections to have ugly conditionals for Unicode-specific tests, like #ifdef _UNICODE ... unicode-specific tests ... #endif
This problem can be resolved by using instead function or macro as Unicode indicator. Value of this indicator will be defined, basing on presence of _UNICODE macro.
Now we'll have better-looking check, like: if (TEST_UNICODE) { ... unicode-specific tests ... }
IMO unicode tests should be executed unconditionally, they just should be marked as "expected to fail". If you are going to conditionally run unicode tests, you eventually will come to a conclusion to conditionally run too much tests (native APIs, etc.)
--- Dmitry Timoshkov dmitry@baikal.ru wrote:
IMO unicode tests should be executed unconditionally, they just should be marked as "expected to fail".
I prefer all tests to succeed on all MS platforms. The goal of the tests is to document Windows behavior and by definition they should not fail on Windows.
Unicode tests, failing on Win95 may, or may not succeed in other Win32 API implementations working in win95 mode. This does not matter for these implementations. You'll have some Unicode tests which fail under Wine, others which succeed under Wine. We are not going to change Wine because of this, but will have to mark them differently, like FAIL_MS_UNICODE(FAIL_WINE_UNICODE(test)). Now add to this mess ODIN Unicode failed tests.
Besides, it is more convenient to find out conditional one time and go through block of tests than think about each test conditional.
If you are going to conditionally run unicode tests, you eventually will come to a conclusion to conditionally run too much tests (native APIs, etc.)
Yes, we document a few Windows platforms together and conditionals indicate differences between them. This is not my fault they are so different :-)
Could you explain the problem of "conditionally running too much tests"? What is bad about it?
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
"Andriy Palamarchuk" apa3a@yahoo.com wrote:
IMO unicode tests should be executed unconditionally, they just should be marked as "expected to fail".
I prefer all tests to succeed on all MS platforms. The goal of the tests is to document Windows behavior and by definition they should not fail on Windows.
In that case you are going to test only very small subset of win32 API.
Unicode tests, failing on Win95 may, or may not succeed in other Win32 API implementations working in win95 mode.
There are no "Win32 API implementations working in win95 mode". At all.
This does not matter for these implementations. You'll have some Unicode tests which fail under Wine, others which succeed under Wine. We are not going to change Wine because of this, but will have to mark them differently, like FAIL_MS_UNICODE(FAIL_WINE_UNICODE(test)). Now add to this mess ODIN Unicode failed tests.
Besides, it is more convenient to find out conditional one time and go through block of tests than think about each test conditional.
It's more convenient IMO to not bother to limit ourself by any borders at all. Just do run all available tests and analyze the results. If something fails, find out why, and if needed mark it as "expected to fail".
If you are going to conditionally run unicode tests, you eventually will come to a conclusion to conditionally run too much tests (native APIs, etc.)
Yes, we document a few Windows platforms together and conditionals indicate differences between them. This is not my fault they are so different :-)
I said it before: there is no different windows platforms from the Wine point of view.
Could you explain the problem of "conditionally running too much tests"? What is bad about it?
Same comment as above. There is no code branches in the Wine code if(win95) do something else if(nt40) do something else else do something else
Conditional tests should not exist either.
If tomorrow microsoft will invent "yet another win32 platform" are you going to add new conditional test suit?
Andriy Palamarchuk apa3a@yahoo.com writes:
I prefer all tests to succeed on all MS platforms. The goal of the tests is to document Windows behavior and by definition they should not fail on Windows.
By definition they won't fail, if they do your test is broken. If calling some function on Win95 returns ERROR_CALL_NOT_IMPLEMENTED, the right thing is not to avoid calling the function, it's to call it and check for this error code.
--- Alexandre Julliard julliard@winehq.com wrote:
By definition they won't fail, if they do your test is broken. If calling some function on Win95 returns ERROR_CALL_NOT_IMPLEMENTED, the right thing is not to avoid calling the function, it's to call it and check for this error code.
Are we going to make Wine to confirm MS implementations to up to this level?
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
Andriy Palamarchuk apa3a@yahoo.com writes:
By definition they won't fail, if they do your test is broken. If calling some function on Win95 returns ERROR_CALL_NOT_IMPLEMENTED, the right thing is not to avoid calling the function, it's to call it and check for this error code.
Are we going to make Wine to confirm MS implementations to up to this level?
No but we don't have to if the test is done right. Something like:
ret = fooW(...); if (!ret) /* failed */ ok( GetLastError() == ERROR_CALL_NOT_IMPLEMENTED ); else /* success */ ok( check for successful result );
This will work on all platforms, no matter whether they support Unicode or not.
--- Alexandre Julliard julliard@winehq.com wrote:
Andriy Palamarchuk apa3a@yahoo.com writes:
Are we going to make Wine to confirm MS implementations to up to this level?
No but we don't have to if the test is done right. Something like:
ret = fooW(...); if (!ret) /* failed */ ok( GetLastError() == ERROR_CALL_NOT_IMPLEMENTED ); else /* success */ ok( check for successful result );
This will work on all platforms, no matter whether they support Unicode or not.
Do you suggest to do this for each W call? Isn't it too difficult, even if we are going to wrap the call in some macro, like:
CHECK_UNICODE( fooW(...) ): ok( check for successful result );
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
On Tue, 22 Jan 2002, Alexandre Julliard wrote:
Absolutely; but this has nothing to do with TCHAR at all, we need a convenient way to specify Unicode strings in any case.
Indeed, the two issues are orthogonal.
Since my Linux box is down for a few days now, here is some untested code trying to implement the teststr() idea.
wine/tests.h:
/*-----------------------------------------------------------*/ /* The teststr() function iterates through a set of strings */ /* with a certain set of characteristics. When the set is */ /* exausted, it returns NULL. The sets are identified by a */ /* constants defined later in the header, with the prefix */ /* TSC_. Typical usage of the function is as follows: */ /* TCSTR tstr = TSC_XXX; */ /* while ( tstr = teststr(tstr) ) */ /* <code which uses tstr> */ /* Simply replate T with A/W for encoding specific code. */ /*-----------------------------------------------------------*/ LPCWSTR teststrW(LPCWSTR wstr); LPCSTR teststrA(LPCSTR astr);
/* the TESTSTR.class field defines the class of strings */ /* in the future, we can have a union which holds additional */ /* parameters for the given class of strings. */ typedef struct tagTESTSTRW { int class; LPCWSTR str; } TESTSTRW, *LPTESTSTRW;
typedef struct tagTESTSTRA { int class; LPCSTR str; } TESTSTRA, *LPTESTSTRA;
#define _TSCT(cls, strtype, t) \ extern TESTSTR##t tsc_##cls##t; \ LPC##strtype const TSC_##cls##t = &(tsc_##cls##t.str)
#define _TSC(cls) \ _TSCT(cls, WSTR, W); \ _TSCT(cls, STR, A)
#define STR2TESTSTR(str) (LPTESTSTR)( ((void*)str) - \ ((void*)&tsc_ANYA.str) - ((void*)&tsc_ANYA) )
/* constants defining the test string sets #define TSC_ANY _TSC(ANY) #define TSC_XXX .....
teststr.c:
#define BEGINA (LPCSTR)(-1) #define BEGINW (LPCWSTR)(-1)
TESTSTRA tsc_ANYA = { 0, BEGINA }; TESTSTRW tsc_ANYW = { 0, BEGINW }; ....
LPCSTR teststrA(LPCSTR str) { LPTESTSTR tstr = STR2TESTSTR(str); if (tstr.str == BEGINA) ... <here we can do a lot of things> }
<similarly for teststrW>
Now, should I go more into the details of the implementation? Is everyone convinced that the interface is (1) simple, (2) useful, (3) implementable?
-- Dimi.
--- Alexandre Julliard julliard@winehq.com wrote:
If your test does:
foo(_T("abc"))
it essentially does:
fooA("abc") fooW(L"abc")
but in most cases since fooA converts the string and calls fooW already you are not really testing anything more than by simply calling fooA. This doesn't mean that you shouldn't test fooW at all, but the interesting thing to do is to test fooW with Unicode input. So you should do something like:
fooA("abc") fooW(L"some unicode chars here")
Now with your solution:
foo(_T("abc")) #ifdef UNICODE fooW(L"some unicode chars here") #endif
the test is more complex to write, it takes twice as long to compile and run, and it doesn't test more than my version.
Alexandre, you forgot conditional in your test example and mentioned in mine ;-) With your permission I rewrite your test:
fooA("abc") #ifdef UNICODE fooW(L"some unicode chars here") #endif
The only differences are that with TCHAR version I have additional fooW(L"abc") test and have to use different names. Having at least simple test for Unicode API is better than having nothing at all.
I don't see how using TCHAR will prevent you from writing Unicode-specific tests.
You will never test interesting combinations like CreateEventA/OpenEventW because it's a big pain to write.
Exactly the same for your tests. See above.
Using TCHAR insures that A and W functions have
the
same behaviour. Why you are against using the same code for this
and
using encoding-specific code to check the
differences?
Because it makes writing a small subset of tests slightly easier at the cost of making the really interesting tests much more painful to write. Not a good trade-off.
IMHO we don't have this issue. As you agreed we have advantages with simple boring tests, but complexity writing "really interesting tests" is exactly the same.
However, I agree that we have different tradoff with TCHAR - more complex build process, but this is not such a big problem.
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
Andriy Palamarchuk apa3a@yahoo.com writes:
Alexandre, you forgot conditional in your test example and mentioned in mine ;-) With your permission I rewrite your test:
fooA("abc") #ifdef UNICODE fooW(L"some unicode chars here") #endif
No I didn't forget anything. There cannot be a #ifdef UNICODE in my version because we won't compile things twice, and UNICODE will never be defined, just like it's not defined when building Wine.
In your version you need the #ifdefs, maybe not in this simple example but in general as soon as you mix TCHAR and WCHAR, because otherwise it won't compile in ASCII mode.
On Wed, 23 Jan 2002, Alexandre Julliard wrote:
In your version you need the #ifdefs, maybe not in this simple example but in general as soon as you mix TCHAR and WCHAR, because otherwise it won't compile in ASCII mode.
Sorry, maybe I miss something, but I don't understand why it wouldn't:
BOOL WINAPI CopyFileA(LPCSTR,LPCSTR,BOOL); BOOL WINAPI CopyFileW(LPCWSTR,LPCWSTR,BOOL); #define CopyFile WINELIB_NAME_AW(CopyFile)
which means that if we use the explicit A or W versions, it will compile no matter what mode we're using.
-- Dimi.
"Dimitrie O. Paun" dimi@cs.toronto.edu writes:
Sorry, maybe I miss something, but I don't understand why it wouldn't:
BOOL WINAPI CopyFileA(LPCSTR,LPCSTR,BOOL); BOOL WINAPI CopyFileW(LPCWSTR,LPCWSTR,BOOL); #define CopyFile WINELIB_NAME_AW(CopyFile)
which means that if we use the explicit A or W versions, it will compile no matter what mode we're using.
Sure, in isolation it will work, but try integrating that in a test using TCHARs. For instance:
TSTR filename = _T("some_file");
handle = CreateFile(filename,...); WriteFile( handle, ...); etc. CopyFile( filename, _T("newfile"), FALSE ); /* now test some Unicode chars */ CopyFileW( filename, L"unicode_file_name", FALSE );
This won't work because you can't pass a TSTR to a W function. So you either need #ifdefs, or you need to use W functions and WCHAR throughout, which means you don't need TCHAR at all.
On Wed, 23 Jan 2002, Alexandre Julliard wrote:
[snip]
CopyFile( filename, _T("newfile"), FALSE ); /* now test some Unicode chars */ CopyFileW( filename, L"unicode_file_name", FALSE );
This won't work because you can't pass a TSTR to a W function. So you either need #ifdefs, or you need to use W functions and WCHAR throughout, which means you don't need TCHAR at all.
Ah, I see. We certainly need to get from generic to specific. So we can have a bunch of funtions which do just that:
LPCSTR strTtoA(LPCWSTR); // converts the string to ANSI LPCWSTR strTtoW(LPCSTR); // converts the string to Unicode LPCXSTR strTtoX(LPCTSTR); // if T=A, convert to W, else convert to A
So the above example becomes: CopyFileW( strTtoW(filename), L"unicode_file_name", FALSE );
-- Dimi.
"Dimitrie O. Paun" dimi@cs.toronto.edu writes:
Ah, I see. We certainly need to get from generic to specific. So we can have a bunch of funtions which do just that:
LPCSTR strTtoA(LPCWSTR); // converts the string to ANSI LPCWSTR strTtoW(LPCSTR); // converts the string to Unicode LPCXSTR strTtoX(LPCTSTR); // if T=A, convert to W, else convert to A
So the above example becomes: CopyFileW( strTtoW(filename), L"unicode_file_name", FALSE );
Sure, I don't doubt that by spending enough time and energy we'll someday have a solution for all the problems caused by wanting to add the TCHAR stuff in the first place. It's a bit like adding cushions on the wall so you can keep banging your head on it without getting hurt too much.
On Wed, 23 Jan 2002, Alexandre Julliard wrote:
Sure, I don't doubt that by spending enough time and energy we'll someday have a solution for all the problems caused by wanting to add the TCHAR stuff in the first place. It's a bit like adding cushions on the wall so you can keep banging your head on it without getting hurt too much.
:) Somehow I was expecting this :)
But you are correct, the problem is not as simple as it seems, and is hard to see, without code, which approach is more preferable. I don't want to use the TCHAR stuff just because I can, but because I think we can solve some real problems with a little added complexity. The code will prove with tradeoff is better.
Now, on a different note, you pointed out an interesting thing: we need to test mixed A/W cases. We thus have four possible cases: (A,A), (A,W), (W,A), (W,W) Should we test all of them?
What is supposed to happen when we have a W->A conversion involving Unicode char which so not map to A?
-- Dimi.
"Dimitrie O. Paun" dimi@cs.toronto.edu writes:
Now, on a different note, you pointed out an interesting thing: we need to test mixed A/W cases. We thus have four possible cases: (A,A), (A,W), (W,A), (W,W) Should we test all of them?
We must test everything that is part of the expected behavior of the function. What this means depends on what the function does. In some cases we have to test combinations, in other cases we don't.
<rant> Writing tests is not simply a matter of taking a template and filling in the name of the function you are testing; it requires studying the API documentation, investigating the function behavior, and then devising ways to test that behavior. These ways will likely be different for just about every function. It requires thought and creativity, not simply applying a recipe. </rant>
What is supposed to happen when we have a W->A conversion involving Unicode char which so not map to A?
They get mapped to the default char, usually '?'.
On Tue, 22 Jan 2002, Alexandre Julliard wrote:
"Dimitrie O. Paun" dimi@cs.toronto.edu writes:
With all due respect Alexandre, I can't understand your point. When does the _semantics_ of the function differ based on the string encoding???
Functions that take strings usually do something with them, so this is part of the function semantics, and it differs between ASCII and Unicode. A fundamental part of that is making sure that all characters are preserved correctly (no W->A->W round-trip) and it's precisely the thing that will never get tested with the TCHAR stuff.
And this is precisly where I can not understand you. When we test stuff, we need to worry mainly about 2 things: 1. function semantics 2. W->A->W conversions, etc
(1) devides nicely into 1.A functions which don't care much about what the string is, it just gets passed around -- this case would benefit from TCHAR 1.B functions which deal with characters (length, possitions, etc) -- these are defined in terms of characters, so again TCHAR is OK That is, TCHAR is beneficial in testing the semantics of the function, save a few freak cases.
(2) is orthogonal to (1) Now it seems that you consider that having encoding-independent tests would completely miss this case, where as I find myself at the exactly opposite end.
Truth be told, I am not 100% in the "TCHAR" camp either. I need to see some code to be convinced either way, as you probably do too. So let me produce some code so that we can argue on something more concrete.
That being told, I am truely interested to understand your reasoning. From previous experience, you have solid reasons for arguing something, and given that I fail completely to see your side of the story, I must miss something rather important.
If all you want is to call the function with some random string, then you don't really need to call both A and W since they usually use the same code anyway.
But then we would have failed to test the (very important) W->A->W aspect, and it seems you contradict your previous statement...
-- Dimi.
Alexandre, while I'm not pushing using TCHAR.h here, can't agree with some your arguments.
--- Alexandre Julliard julliard@winehq.com wrote:
Andriy Palamarchuk apa3a@yahoo.com writes:
Here I don't agree with you. Programming with
TCHAR is
*exactly* the same as programming with WCHAR, but
with
different names. You think once - in WCHAR terms
and
get as a bonus ANSII version!
You don't think in WCHAR terms. There isn't a single real Unicode character in your test, it's all purely ASCII.
If you look even more carefully you'll find that my test does not have a single real ASCII character :-P Indeed, I do not test text processing there at all. I use text strings there for: a) identifying registry keys b) testing system settings broadcast behaviour (comparison against empty string) c) tests description
In *my* test application W and A versions of functions have exactly the same behaviour and I'm very glad I don't need to write the same code twice.
That's the main problem with TCHAR, you have to stick to the lower common denominator, which means no Unicode chars, no multi-byte support, etc.
a) in places where you need unicode characters or Unicode-specific checks you can use them with #ifdef _UNICODE. Test strings generator even will help to eliminate some of such conditionals.
b) multibyte, not one-byte characters are the common lowest denominator. Converting the test to TCHAR.h I had to remember that I'm using multi-byte characters. E.g. in some places I needed to replace sizeof(buf) to sizeof(buf)/sizeof(_TCHAR).
In fact, code became cleaner after I removed "if (has_unicode()) " blocks, A<->W conversions,
letters
"A", "W" from API names. I feel that in this case
I
don't need to "think twice" - once in terms of
ANSII,
the second time - in terms of Unicode.
But we want people to think twice, and write a test adapted to the function they are testing; you don't test ASCII and Unicode the same way, except superficially.
It would be silly for me to separate W and A checks everywhere if they are different only in a few cases.
This TCHAR crap is just a marketing tool to let Microsoft pretend that converting to Unicode is easy; but it's not usable in real life
Nope. Purpose of TCHAR.h is support of compilation of the same code for both - Unicode and ASCII platforms. You still have to make the same decisions as with moving to pure Unicode API.
how many apps do you know that ship both an A and a
W
exe?)
I came across a few OS projects which use TCHAR: * Tcl - e.g. see http://tcl.apache.org/sources/tcl/win/tclWinFCmd.c.html * XEmacs - this article will be interesting for you: http://www.xemacs.org/Architecting-XEmacs/windows-I18N.html
You will agree with me - it is difficult to find this sort of statistics. But many application have separate binaries for Windows NT and for Windows 95. One of differences between these binaries can be usage of Unicode.
We don't use it in Wine, and we must not use it in Wine tests either.
We don't use TCHAR in Wine because it does not give us any advantages. W and A Wine functions can't exist in parallel universes, they must interact, probably factor out common behaviour.
But I can see which advantages we'll have with using TCHAR with wine tests.
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
Andriy Palamarchuk apa3a@yahoo.com writes:
If you look even more carefully you'll find that my test does not have a single real ASCII character :-P Indeed, I do not test text processing there at all. I use text strings there for: a) identifying registry keys b) testing system settings broadcast behaviour (comparison against empty string) c) tests description
But none of them needs Unicode at all, you would test exactly the same thing by doing an ASCII only version (ok except maybe the empty string stuff). And how do you test for instance SystemParametersInfoA sending a message to a Unicode window proc? You can't do that with TCHAR.
b) multibyte, not one-byte characters are the common lowest denominator. Converting the test to TCHAR.h I had to remember that I'm using multi-byte characters. E.g. in some places I needed to replace sizeof(buf) to sizeof(buf)/sizeof(_TCHAR).
That's not multi-byte charset support. Multibyte is where you can have two ASCII bytes represent one character in MBCS codepages. This is another example where ASCII and Unicode behave differently; so if you put a multi-byte char in one of the _T("") strings the test will break for any API that returns a character count.
We don't use TCHAR in Wine because it does not give us any advantages. W and A Wine functions can't exist in parallel universes, they must interact, probably factor out common behaviour.
But I can see which advantages we'll have with using TCHAR with wine tests.
Yes of course there are a few advantages. There are also large inconvenients, which more than offset the small gain IMHO. If your tests for A and W are identical it's not big deal to write both; and if they are different at least you can write them without having to resort to ugly #ifdefs and other hacks.
On Mon, 21 Jan 2002, Andriy Palamarchuk wrote:
Attached version of the C testing framework, which is implemented with using TCHAR.H macros, so it is portable between ASCII and Unicode platforms. Also implemented test which can be used to test ASCII and Unicode API.
Cool.
I have a number of observations: -- we should rename wt_helper.h to something like wine/tests.h -- maybe we should not use main as the main funtion, but rather something like 'test'. This way we can provide the main and have another level of indirection which can be put to good use. (like we should not need the explicit end_tests()) -- another thing we can do is to have the tests in functions named testXXX. This way we can use nm to generate the main function, and so we can put a bunch of tests in the same executable. -- wt_helper.h should include tchar.h, and redefine _T to call a function to transform the string to Unicode if need be. This way we get rid of the compiler requirement. -- I still think my teststr() idea is worth doing. Do you want an implementation? -- if TCHAR is not available, we should typedef _TCHAR TCHAR in wine/tests.h -- Why do you do: _T(__FILE__) Files are ASCII, no need to _T them.
More comments later :)
-- Dimi.
--- "Dimitrie O. Paun" dimi@cs.toronto.edu wrote:
I have a number of observations: -- we should rename wt_helper.h to something like wine/tests.h
I'm open for suggestions. I used this name to avoid name clashes with Perl winetest framework. BTW, wt = Wine Test. I'd prefer more recognizable name than "test.h".
-- maybe we should not use main as the main funtion, but rather something like 'test'. This way we can provide the main and have another level of indirection which can be put to good use. (like we should not need the explicit end_tests()) -- another thing we can do is to have the tests in
functions named testXXX. This way we can use nm to generate the main function, and so we can
put a bunch of tests in the same executable.
Whether we'll use these ideas depends on architecture of the whole testing process.
-- wt_helper.h should include tchar.h, and redefine _T to call a function to transform the string to Unicode if need be. This way we get rid of the compiler requirement.
Using function instead of macro won't work in all the cases, e.g. in this one:
_TCHAR buf[100] = _T("foo")
-- I still think my teststr() idea is worth doing. Do you want an implementation?
I like the idea, but do not need such Unicode strings generator for my test :-) Can you implement it with a small real test which shows advantages of teststr()?
-- if TCHAR is not available, we should typedef _TCHAR TCHAR in wine/tests.h
Yes, we'll be able to do this. I came across opposite situation. Cygwin has TCHAR, but not _TCHAR. Already submitted bug report to cygwin project. AFAICS TCHAR and _TCHAR have almost the same semantics.
MBCS Survival Guide at http://www.microsoft.com/globaldev/fareast/mbcssg.asp says: "For ANSI conformance, the "official" type is _TCHAR. In practice, either TCHAR or _TCHAR is acceptable."
-- Why do you do: _T(__FILE__) Files are ASCII, no need to _T them.
__FILE__is a macro which is expanded to file name. I use _T with it for simplicity - to have the same ASCII/Unicode mode processing for everything. Otherwise I'd have to explicitely call ASCII functions for file names processing, probably do A->W conversion.
More comments later :)
Look forward :-)
Andriy Palamarchuk
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
Just commenting from the side-lines :-)
On Tue, 22 Jan 2002, Andriy Palamarchuk wrote: [...]
Using function instead of macro won't work in all the cases, e.g. in this one:
_TCHAR buf[100] = _T("foo")
I would rather avoid such constructs precisely because of the compiler support problems. We have to support _T because of legacy windows code, but if we write the code then it should be very easy to avoid its use entirely:
_TCHAR buf[100]; .... and at the start of the functions using buf: {
unicode(buf,sizeof(buf),"foo"); ...
or as a macro: STR2UNICODE(buf,"foo");
with a secondary use as:
wstr=unicode(NULL,0,"foo"); // allocate and return a unicode string
-- I still think my teststr() idea is worth doing. Do you want an implementation?
I like the idea, but do not need such Unicode strings generator for my test :-) Can you implement it with a small real test which shows advantages of teststr()?
I like the teststr() idea too. It's a bit complex but it sounds like it could help write better tests.
-- Francois Gouget fgouget@free.fr http://fgouget.free.fr/ It really galls me that most of the computer power in the world is wasted on screen savers. Chris Caldwell from the GIMPS project http://www.mersenne.org/prime.htm
On Tue, 22 Jan 2002, Andriy Palamarchuk wrote:
--- "Dimitrie O. Paun" dimi@cs.toronto.edu wrote:
I have a number of observations: -- we should rename wt_helper.h to something like wine/tests.h
I'm open for suggestions. I used this name to avoid name clashes with Perl winetest framework. BTW, wt = Wine Test.
Well, yeah, I figured that much, but is ugly like hell.
I'd prefer more recognizable name than "test.h".
It's not test.h, it's wine/tests.h which is both clean, recognizable, and pretty. In code, you have:
#include "wine/tests.h"
-- maybe we should not use main as the main funtion, but rather something like 'test'. This way we can provide the main and have another level of indirection which can be put to good use. (like we should not need the explicit end_tests()) -- another thing we can do is to have the tests in
functions named testXXX. This way we can use nm to generate the main function, and so we can
put a bunch of tests in the same executable.
Whether we'll use these ideas depends on architecture of the whole testing process.
Duh! :) But I thought that's what we're working on...:)
-- wt_helper.h should include tchar.h, and redefine _T to call a function to transform the string to Unicode if need be. This way we get rid of the compiler requirement.
Using function instead of macro won't work in all the cases, e.g. in this one:
_TCHAR buf[100] = _T("foo")
So, don't do that. Francois showed a number of good ways of doing it. In any case, I think we should discourage the use of explicit strings in the tests, for reasons outlined by Alexandre. I think we should use a teststr() call instead of most hardcoded strings.
-- I still think my teststr() idea is worth doing. Do you want an implementation?
I like the idea, but do not need such Unicode strings generator for my test :-) Can you implement it with a small real test which shows advantages of teststr()?
Fine, I'll try to do that, it's just that I'm _very_ busy at the moment. I guess this means more late nights... :)
-- Why do you do: _T(__FILE__) Files are ASCII, no need to _T them.
__FILE__is a macro which is expanded to file name. I use _T with it for simplicity - to have the same ASCII/Unicode mode processing for everything. Otherwise I'd have to explicitely call ASCII functions for file names processing, probably do A->W conversion.
But we don't need it. I know what __FILE__ does, it returns an ASCII string, so we should just work with it as such.
-- Dimi.