On Tue, Dec 04, 2012 at 08:30:55PM -0700, Alex Henrie wrote:
2012/12/4 Fr?d?ric Delanoy frederic.delanoy@gmail.com:
The above MSDN comment indicates pre-Vista versions are buggy, so it's probably not a good idea to match that behaviour.
I think encoding and decoding in UTF-7 arbitrary binary data was considered a "feature" in Windows XP. As MSDN said, "Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems." So I'm sure there's at least one application that depends on the data not being Unicode-normalized. Whoever adds normalization will have to make sure it's turned off in Windows XP (or older) mode.
Actually UTF-8 is a PITA - a program has to know whether every individual C string (or file) is UTF-8 or 8bit ascii (well 8859-x). Assuming UTF-8 doesn't work unless in can process all arbitrary byte sequences (and write them back) - which the standard doesn't allow for.
In the US it probably isn't often an issue, but in europe there are mane files that have occaisional characters with the top bit set. In the UK we only see 0xA3 (pound sterling) - but it can crop up anywhere - and causes my mail program (which, for some reason I don't understand) assumes UTF-8 do drop core responsing to mails!
David