Re: [PATCH 1/3] kernel32: Support UTF-7 in MultiByteToWideChar. (try 3)

16 Oct 2014

      2014-10-16 6:25 GMT-06:00 Michael Stefaniuc mstefani@redhat.com:
...
On 10/16/2014 01:38 AM, Alex Henrie wrote:
...
2014-10-15 15:15 GMT-06:00 Michael Stefaniuc
mstefani@redhat.com:
...
On 10/15/2014 03:16 AM, Alex Henrie wrote:
...
2014-10-14 16:12 GMT-06:00 Michael Stefaniuc
mstefani@redhat.com:
...
...

/* MultiByteToWideChar guarantees that srclen > 0 */ +

assert(srclen > 0);
This assert isn't needed. Just rewriting the first do..while
loop to a while loop avoids it. That eliminates 3 lines of
code of which one is a comment pointing to an unneeded
assert.
A while loop adds an unnecessary check, but whatever. You and
Sebastian Lackner both requested it, so I concede.
Uh? Trying to avoid a loop check with a do..while loop is a
micro-optimization. That's the job of the compiler. Our job is
to optimize the code for readability.
There's no way that the compiler is verifying that the passed
srclen is always greater than 0 and internally converting the loop
to a
It doesn't have to verify srclen as it already knows it is > 0.
As MultiByteToWideChar() is the sole user of your function and it uses
it only from one place it will most likely just inline it.
OK, you're right. At O1 or higher, the compiler appears to inline the
function and rewrite the loop as a do-while loop.
...
Reviewer Cost:
Now to the costs of the do while loop on the reviewer side:

Look at the diff in the email client
What's that first do while loop for? Scroll down as the big comments
make the loop end not fit on one page.
Uh, he isn't checking srclen to be > 0 and uses it as array index.
Meh.

Let's imagine that the reviewer is looking at the while loop version.
He might say "why is this code using a while loop if the condition is
guaranteed to be true the first time through?" Then he'd have to try
to think through whether the compiler will optimize that out or not.
This whole problem could be avoided with a do-while loop and one line
of comment explaining that srclen is guaranteed to be > 0.
...
...
I have now rewritten the tests to programatically generate the
input, but there is no way to programatically generate the output
without duplicating pretty much the entire Windows UTF-7 algorithms
in the tests, which would defeat the purpose of having tests.
If you are just want to check the exact mappings that Windows does
between WCHAR and UTF-7 it isn't hard to generate.  And you can even
check the encoding for the whole WCHAR range with a for loop from 0 to
WCHAR_MAX:

At 64K WCHARs the src buffer will eat just 128KB. Fill it with a for
loop.
The dest buffer assuming a worst case 1 WCHAR to 4 char encoding will
be just 256KB.
WideCharToMultiByte() that.
Calculate the SHA1 sum of the UTF-7 stream and compare it to a
previously computed SHA1.
MultiByteToWideChar the UTF-7 stream you got. Either compare it
directly to the WCHAR src if the operation happens to be symmetric or
calculate again the SHA1 sum and compare it to the stored.

Then you can reserve the other tests for the interesting stuff where you
really have to store both src and dest. But those should be short
strings anyway.
Let's imagine that someone else tinkers with the UTF-7 code, and
accidentally introduces a bug. If the tests only compare hashes, how
would he have any idea what exactly broke?
If the purpose of the tests is purely to check conformance, then
hashes would be fine. But if you want them to aid in debugging as
well, they need to compare against unprocessed output.
On second thought, I've rewritten the large tests per your previous
suggestion. They no longer check that the exact UTF-7 encoding is
correct, but this is not a big deal because other tests already check
exact UTF-7 encodings thoroughly. The new tests instead test each
character, including characters 0 through FFFF if testing WCHARs.
-Alex

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH 1/3] kernel32: Support UTF-7 in MultiByteToWideChar. (try 3)

Reviewer Cost: