Re: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

List overview All Threads

newer

older

Re: msi: Fix logical expressions...

Re: [PATCH 4/6] server: Store and...

Dmitry Timoshkov

3 Dec 2012 3 Dec '12

5:27 a.m.

Alex Henrie alexhenrie24@gmail.com wrote:

...

I came back to the problem of UTF-7 support and made some improvements to my previous submission. The tests are now more stringent, especially in regard to null terminator checking, and they test the srclen parameter more thoroughly now as well.

I also noticed that a related test for error checking was marked todo_wine. The fix was trivial and is included as a fourth patch.

Please let me know what you think. I'm willing to put some more effort into this.

Why don't you put it in libs/wine where other unicode conversion routines are implemented? Also you probably need to add support for composition/ surrogates like other implementations do.

-- Dmitry.

Show replies by date

Alex Henrie

4 Dec 4 Dec

4:30 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

2012/12/2 Dmitry Timoshkov dmitry@baikal.ru:

...

Why don't you put it in libs/wine where other unicode conversion routines are implemented?

Before I started this project I asked where to put the functions: http://www.winehq.org/pipermail/wine-devel/2012-January/093705.html

I received no reply, so I put them in libwine. Then I was told to move them to kernel32: http://www.winehq.org/pipermail/wine-devel/2012-July/096531.html

...

Also you probably need to add support for composition/ surrogates like other implementations do.

MSDN states:

"Starting with Windows Vista, this function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function on valid UTF-8 strings will behave the same way as on earlier Windows operating systems." http://msdn.microsoft.com/en-us/library/windows/desktop/dd319072%28v=vs.85%2...

My implementation is modeled after Windows XP (Wine's default target Windows version), which encodes and decodes arbitrary character sequences without normalization. I saw that my submission has already been marked "rejected"--was this why?

-Alex

Frédéric Delanoy

8:56 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

On Tue, Dec 4, 2012 at 5:30 AM, Alex Henrie alexhenrie24@gmail.com wrote:

...

2012/12/2 Dmitry Timoshkov dmitry@baikal.ru:

...
Also you probably need to add support for composition/ surrogates like other implementations do.

MSDN states:

"Starting with Windows Vista, this function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function on valid UTF-8 strings will behave the same way as on earlier Windows operating systems." http://msdn.microsoft.com/en-us/library/windows/desktop/dd319072%28v=vs.85%2...

My implementation is modeled after Windows XP (Wine's default target Windows version), which encodes and decodes arbitrary character sequences without normalization. I saw that my submission has already been marked "rejected"--was this why?

-Alex

The above MSDN comment indicates pre-Vista versions are buggy, so it's probably not a good idea to match that behaviour.

But to know the reason your patch was reject, you should ask the maintainer (Alexandre Julliard)

Frédéric

Alex Henrie

5 Dec 5 Dec

3:30 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

2012/12/4 Frédéric Delanoy frederic.delanoy@gmail.com:

...

The above MSDN comment indicates pre-Vista versions are buggy, so it's probably not a good idea to match that behaviour.

I think encoding and decoding in UTF-7 arbitrary binary data was considered a "feature" in Windows XP. As MSDN said, "Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems." So I'm sure there's at least one application that depends on the data not being Unicode-normalized. Whoever adds normalization will have to make sure it's turned off in Windows XP (or older) mode.

-Alex

David Laight

8:52 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

On Tue, Dec 04, 2012 at 08:30:55PM -0700, Alex Henrie wrote:

...

2012/12/4 Fr?d?ric Delanoy frederic.delanoy@gmail.com:

...
The above MSDN comment indicates pre-Vista versions are buggy, so it's probably not a good idea to match that behaviour.

I think encoding and decoding in UTF-7 arbitrary binary data was considered a "feature" in Windows XP. As MSDN said, "Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems." So I'm sure there's at least one application that depends on the data not being Unicode-normalized. Whoever adds normalization will have to make sure it's turned off in Windows XP (or older) mode.

Actually UTF-8 is a PITA - a program has to know whether every individual C string (or file) is UTF-8 or 8bit ascii (well 8859-x). Assuming UTF-8 doesn't work unless in can process all arbitrary byte sequences (and write them back) - which the standard doesn't allow for.

In the US it probably isn't often an issue, but in europe there are mane files that have occaisional characters with the top bit set. In the UK we only see 0xA3 (pound sterling) - but it can crop up anywhere - and causes my mail program (which, for some reason I don't understand) assumes UTF-8 do drop core responsing to mails!

David

-- David Laight: david@l8s.co.uk

Dmitry Timoshkov

10:14 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

David Laight david@l8s.co.uk wrote:

...

...
I think encoding and decoding in UTF-7 arbitrary binary data was considered a "feature" in Windows XP. As MSDN said, "Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems." So I'm sure there's at least one application that depends on the data not being Unicode-normalized. Whoever adds normalization will have to make sure it's turned off in Windows XP (or older) mode.

Actually UTF-8 is a PITA - a program has to know whether every individual C string (or file) is UTF-8 or 8bit ascii (well 8859-x). Assuming UTF-8 doesn't work unless in can process all arbitrary byte sequences (and write them back) - which the standard doesn't allow for.

Alex is adding UTF-7 support (although the problem may be in the same area as for UTF-8).

-- Dmitry.

Alexandre Julliard

4 Dec 4 Dec

10 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

Alex Henrie alexhenrie24@gmail.com writes:

...

My implementation is modeled after Windows XP (Wine's default target Windows version), which encodes and decodes arbitrary character sequences without normalization. I saw that my submission has already been marked "rejected"--was this why?

It was rejected because you still haven't addressed any of the issues I mentioned in previous reviews. It doesn't look like this is going anywhere.

-- Alexandre Julliard julliard@winehq.org

Alex Henrie

5 Dec 5 Dec

3:32 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

2012/12/4 Alexandre Julliard julliard@winehq.org:

...

Alex Henrie alexhenrie24@gmail.com writes:

...
My implementation is modeled after Windows XP (Wine's default target Windows version), which encodes and decodes arbitrary character sequences without normalization. I saw that my submission has already been marked "rejected"--was this why?

It was rejected because you still haven't addressed any of the issues I mentioned in previous reviews. It doesn't look like this is going anywhere.

Well, I went back through the archives and compiled the feedback you've given:

1. Don't use Katayama Hirofumi's code. http://www.winehq.org/pipermail/wine-devel/2012-May/095451.html

2. Don't use iconv. http://www.winehq.org/pipermail/wine-devel/2012-May/095468.html

3. Put the code in libwine. 4. Don't use malloc. 5. Add tests for buffer overflow. 6. Add tests for partial sequences. http://www.winehq.org/pipermail/wine-devel/2012-July/096531.html

7. Add tests for srclen < -1. http://www.winehq.org/pipermail/wine-devel/2012-August/096867.html

8. Add more tests. 9. Split up the patch. http://www.winehq.org/pipermail/wine-devel/2012-August/096872.html

10. Add tests for srclen > 0. http://www.winehq.org/pipermail/wine-devel/2012-September/096901.html

Far from not addressing any of these issues, I feel that I have addressed all of them. More importantly, my implementation is correct; it matches Windows XP exactly. (I'd be happy to be proven wrong if anyone can show me a case where this is not so.) If there is another issue which I have not addressed, it's because it has not been articulated to me. I get the impression that you have something in particular that you're looking for, but I can't tell what it is.

I do want to see this resolved. It would be a pity if Mac and Linux users miss out because we couldn't come to an agreement...

-Alex

Alex Henrie

5:49 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

Correction: Item 3 should have been "Put the code in kernel32."

-Alex

Alexandre Julliard

9:27 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

Alex Henrie alexhenrie24@gmail.com writes:

...

Far from not addressing any of these issues, I feel that I have addressed all of them. More importantly, my implementation is correct; it matches Windows XP exactly.

You have nowhere near enough tests to make such a claim. When I said to write more tests, I didn't mean one or two more. You'd probably need at least 100 tests to have decent coverage of all the interesting cases.

-- Alexandre Julliard julliard@winehq.org

Alex Henrie

6 Dec 6 Dec

5:57 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

2012/12/5 Alexandre Julliard julliard@winehq.org:

...

You have nowhere near enough tests to make such a claim. When I said to write more tests, I didn't mean one or two more. You'd probably need at least 100 tests to have decent coverage of all the interesting cases.

It's not as much of a technical problem as it is a communication problem--I don't know what "interesting cases" you have in mind that are not already covered by the tests. Perhaps these cases would be obvious to a more experienced developer, but they are not to me.

-Alex

Alexandre Julliard

11:35 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

Alex Henrie alexhenrie24@gmail.com writes:

...

2012/12/5 Alexandre Julliard julliard@winehq.org:

...
You have nowhere near enough tests to make such a claim. When I said to write more tests, I didn't mean one or two more. You'd probably need at least 100 tests to have decent coverage of all the interesting cases.

It's not as much of a technical problem as it is a communication problem--I don't know what "interesting cases" you have in mind that are not already covered by the tests. Perhaps these cases would be obvious to a more experienced developer, but they are not to me.

If you can't think of anything to test beyond the handful of cases you already have, then you shouldn't be implementing that function. There's no hope that your code will be able to cope with invalid input if you can't even imagine what invalid input could look like.

-- Alexandre Julliard julliard@winehq.org

Alex Henrie

10 Dec 10 Dec

4:52 a.m.

New subject: [PATCH 1/4] kernel32: Support UTF-7 in MultiByteToWideChar.

2012/12/6 Alexandre Julliard julliard@winehq.org:

...

If you can't think of anything to test beyond the handful of cases you already have, then you shouldn't be implementing that function. There's no hope that your code will be able to cope with invalid input if you can't even imagine what invalid input could look like.

My submission already tests invalid input. I see now that I could test the ASCII range more thoroughly, and I didn't include tests for base64 sequences terminated without a - sign (which are perfectly valid), but how can I know whether you're looking for this or something else? Who could I ask for help?

-Alex

4585

Age (days ago)

4592

Last active (days ago)

wine-devel@winehq.org

12 comments

5 participants

tags (0)

participants (5)

Alex Henrie
Alexandre Julliard
David Laight
Dmitry Timoshkov
Frédéric Delanoy