David Laight david@l8s.co.uk wrote:
I think encoding and decoding in UTF-7 arbitrary binary data was considered a "feature" in Windows XP. As MSDN said, "Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems." So I'm sure there's at least one application that depends on the data not being Unicode-normalized. Whoever adds normalization will have to make sure it's turned off in Windows XP (or older) mode.
Actually UTF-8 is a PITA - a program has to know whether every individual C string (or file) is UTF-8 or 8bit ascii (well 8859-x). Assuming UTF-8 doesn't work unless in can process all arbitrary byte sequences (and write them back) - which the standard doesn't allow for.
Alex is adding UTF-7 support (although the problem may be in the same area as for UTF-8).