On Thu, 17 Jan 2002, Alexandre Julliard wrote:
...and makes sure that the W functions are never actually tested with Unicode input. It's not enough to simply pass converted ASCII strings to the W functions, we have to test with real Unicode to check for lossy W->A->W conversions, surrogate handling, non-spacing chars, etc. The functions *are* really different and have different testing requirements.
OK, this is good point. But the truth of the matter is that the semantics of 99.99% of the functions should be encoding independent, because the documentation from MS is, the way you are supposed to write your code is, etc. On the other hand, as you pointed out, we would really want to test all the special handling for the Unicode cases. I think these two requirements will in fact push us to a better sollution.
As you well pointed out, when we do write tests, we want to test a lot of things: W->A->W conversions, surrogate handling, non-spacing chars, etc. With the current compiler support, and the current developer knowladge, I am afraid we will only see tests like this:
WCHAR teststr[] = { 'U', 'n', 'i', 'c', 'o', 'd', 'e', '\0' };
which tests nothing of the interesting cases you mentioned, and we will endup with twice as much code foolishly.
My proposal: -- abstract away the generation of test strings (more of this later) -- write mosts tests in an encoding independent manner -- if explicit A or W handling is required, simply use the xxxA or xxxW without any #ifdefs -- always compile and run the tests two times for ANSI/UNICODE
The code will look as such: TCSTR str = TSC_RAND;
while ( (str = teststr(str) ) { <your test which uses 'str' goes here> }
The TSC_XXX set of constants (TSC stands for Test String Class), identifies a class of strings (such as, 3 charactes long, etc.). The teststr() iterates though those strings. It has the following prototype: TCSTR teststr(TCSTR); And it should be fairly simple to implement (I can provide code, if people are interested).
Advantages: -- we can pass in one for the ANSI case, but a few for the Unicode case where we want to test various things. -- for regular unit testing, we may want to return only one representative string for ANSI, and one complicated string for Unicode, but from time to time we may want to run the tests over a large number of test strings. -- the tests so writen can be used to identify and test performance problems, by simply returning a lot more strings, and by logging how long it took to run the test. -- we have a clean way to abstract the testing string, and so most of the test can be written in an encoding independent way. -- it makes it no harder, in _any_ way, to write encoding specific tests. In fact, one can use the teststrA/teststrW versions for their encoding specific tests to get the above mentioned benefits. -- it should be trivial to instrument the Makefiles to compile the tests twice, and thus we should test both the ANSII and the Unicode cases all the time. (that is, this should not be an Alexandre only task :))
Comments?
-- Dimi.