2014-10-16 6:25 GMT-06:00 Michael Stefaniuc mstefani@redhat.com:
On 10/16/2014 01:38 AM, Alex Henrie wrote:
2014-10-15 15:15 GMT-06:00 Michael Stefaniuc mstefani@redhat.com:
On 10/15/2014 03:16 AM, Alex Henrie wrote:
2014-10-14 16:12 GMT-06:00 Michael Stefaniuc mstefani@redhat.com:
- /* MultiByteToWideChar guarantees that srclen > 0 */ +
assert(srclen > 0);
This assert isn't needed. Just rewriting the first do..while loop to a while loop avoids it. That eliminates 3 lines of code of which one is a comment pointing to an unneeded assert.
A while loop adds an unnecessary check, but whatever. You and Sebastian Lackner both requested it, so I concede.
Uh? Trying to avoid a loop check with a do..while loop is a micro-optimization. That's the job of the compiler. Our job is to optimize the code for readability.
There's no way that the compiler is verifying that the passed srclen is always greater than 0 and internally converting the loop to a
It doesn't have to verify srclen as it already knows it is > 0. As MultiByteToWideChar() is the sole user of your function and it uses it only from one place it will most likely just inline it.
OK, you're right. At O1 or higher, the compiler appears to inline the function and rewrite the loop as a do-while loop.
Reviewer Cost:
Now to the costs of the do while loop on the reviewer side:
- Look at the diff in the email client
- What's that first do while loop for? Scroll down as the big comments make the loop end not fit on one page.
- Uh, he isn't checking srclen to be > 0 and uses it as array index. Meh.
Let's imagine that the reviewer is looking at the while loop version. He might say "why is this code using a while loop if the condition is guaranteed to be true the first time through?" Then he'd have to try to think through whether the compiler will optimize that out or not.
This whole problem could be avoided with a do-while loop and one line of comment explaining that srclen is guaranteed to be > 0.
I have now rewritten the tests to programatically generate the input, but there is no way to programatically generate the output without duplicating pretty much the entire Windows UTF-7 algorithms in the tests, which would defeat the purpose of having tests.
If you are just want to check the exact mappings that Windows does between WCHAR and UTF-7 it isn't hard to generate. And you can even check the encoding for the whole WCHAR range with a for loop from 0 to WCHAR_MAX:
- At 64K WCHARs the src buffer will eat just 128KB. Fill it with a for loop.
- The dest buffer assuming a worst case 1 WCHAR to 4 char encoding will be just 256KB.
- WideCharToMultiByte() that.
- Calculate the SHA1 sum of the UTF-7 stream and compare it to a previously computed SHA1.
- MultiByteToWideChar the UTF-7 stream you got. Either compare it directly to the WCHAR src if the operation happens to be symmetric or calculate again the SHA1 sum and compare it to the stored.
Then you can reserve the other tests for the interesting stuff where you really have to store both src and dest. But those should be short strings anyway.
Let's imagine that someone else tinkers with the UTF-7 code, and accidentally introduces a bug. If the tests only compare hashes, how would he have any idea what exactly broke?
If the purpose of the tests is purely to check conformance, then hashes would be fine. But if you want them to aid in debugging as well, they need to compare against unprocessed output.
On second thought, I've rewritten the large tests per your previous suggestion. They no longer check that the exact UTF-7 encoding is correct, but this is not a big deal because other tests already check exact UTF-7 encodings thoroughly. The new tests instead test each character, including characters 0 through FFFF if testing WCHARs.
-Alex