On Nov 9, 2010, at 4:29 PM, Reece Dunn wrote:
You could use autoconf to detect: 1/ broken handling of UTF-8 characters by sed; 2/ name of LC_ALL flag that handles UTF-8
In theory, you only need to set LC_CTYPE, not any other aspect of the locale. And for that, you don't need the language or country. On Mac OS X, the encoding can be bare, such as LC_CTYPE=UTF-8.
The Makefile used to set LANG, then commit 492ac292b918a3369900532e4edfadaeeba32064 changed it to LC_ALL. That wasn't explained. I assume it was because LANG could be superseded by LC_* variables in the user's environment, and that is undesirable.
Perhaps another approach would be to explicitly unset LC_ALL and export LC_CTYPE=UTF-8.
On Nov 9, 2010, at 4:13 PM, Charles Davis wrote:
Unfortunately, I just remembered that the name of the UTF-8 encoding is different on Mac OS ('UTF-8') and Linux ('utf8').
Are you sure about that? Checking on a couple of Linux systems here, the "locale" command reports:
$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" ...
Hmm. However, using a bare encoding for LC_CTYPE doesn't seem to fly on Linux. Darn, so close to a simple fix. :(
-Ken