On 11/9/10 8:02 PM, Charles Davis wrote:
On 11/9/10 7:58 PM, James McKenzie wrote:
On 11/9/10 3:29 PM, Reece Dunn wrote:
On 9 November 2010 22:13, Charles Daviscdavis@mymail.mines.edu wrote:
On 11/9/10 1:58 PM, James Mckenzie wrote:
Charles Daviscdavis@mymail.mines.edu wrote:
On 11/9/10 12:13 PM, James Mckenzie wrote: > No, it is not a bug in GNU sed. The authors.c file needs to have > the erroneous characters for the language used by > MacOSX changed to be acceptable? That ain't gonna fly. I think we should explicitly use a UTF-8 locale (like en_US.UTF-8 or some such) instead of the C locale when sed goes over the AUTHORS file.
Don't shoot the messenger.
Sorry.
The problem with your first idea--removing the bad characters directly from the authors.c file--is that we'd need to use a utility like sed or awk to implement it automatically--which puts us right back where we started. (We could use diff/patch, but is it worth the effort to maintain a patch for this? And would AJ let us put the patch file in Wine? And if not, where would we put it?)
Maybe we can force the use of sed if it exists in the /usr/bin directory then to get around the 'brokenness' of GNU sed on the Mac?
Maybe. But that seems like a hack. A better way might be to detect if we're on Mac OS and using GNU sed; in that case, we use /usr/bin/sed. That's less of a hack, but still a hack.
If not, it is a real bear to set the language on a Mac per previous discussions on the Users list.
That was about setting LANG. Wine always obeys LC_*, and so does sed.
It's not the language that's the problem. It's the encoding. The AUTHORS file is encoded in UTF-8, but GNU sed isn't using UTF-8 because we told it not to (i.e. we told it to use MacRoman because that's the default encoding for the C locale). If we tell it to use UTF-8 (by setting LC_ALL to, for example, 'en_US.UTF-8'), it will process the file correctly.
Unfortunately, I just remembered that the name of the UTF-8 encoding is different on Mac OS ('UTF-8') and Linux ('utf8'). That might prevent us from setting LC_ALL differently. We might end up having to hack around this the way either you or I described.
You could use autoconf to detect: 1/ broken handling of UTF-8 characters by sed; 2/ name of LC_ALL flag that handles UTF-8
NOTE: You will need to enumerate available locales as the user may not have en_US present with UTF-8 encoding (e.g. a Spanish-only or Chinese-only system).
Something like:
cat> get_locale.sh< EOF locale -a | while read locale ; do if [[ LC_ALL=$locale sed< authors.c> /dev/null ]] ; then echo $locale exit fi done EOF
This should print a locale that can process the UTF-8 file. It needs cleaning up a bit, but that is the basis of it.
Thanks Reece.
Charles: You want to do this?
I'm on it.
If you have a patch ready, though, go for it.
No, I'm stuck with a problem in richedit. Besides you have more Mac specific knowledge than I do, and I'm happy to say that. Although, if you need a test 'victim' I'm here for you.
James McKenzie