Hi,
I can't compile authors.c due to some encoding issue with sed. I can generate the authors.c file fine on the command line, but the same command in the Makefile won't generate compilable code due to it terminating the string constant before a UTF-8 character.
I am using Fedora Core 1 with latest updates. $ sed --version GNU sed version 4.0.8
Rob
Example of errors: authors.c:19: error: stray '\366' in program authors.c:20: error: `m' undeclared here (not in a function) authors.c:20: error: initializer element is not constant authors.c:20: error: (near initialization for `SHELL_Authors[18]') authors.c:20: error: syntax error before string constant
On Friday 27 February 2004 03:42, Robert Shearman wrote:
I can't compile authors.c due to some encoding issue with sed. I can generate the authors.c file fine on the command line, but the same command in the Makefile won't generate compilable code due to it terminating the string constant before a UTF-8 character.
That's strange, this issue should be fixed in CVS. What does it say when you do this:
[hans@mirzam shell32]$ file authors.c authors.c: ISO-8859 C program text
How are your $LANG and $LC_ALL variables set?
-Hans
Hans Leidekker wrote:
On Friday 27 February 2004 03:42, Robert Shearman wrote:
I can't compile authors.c due to some encoding issue with sed. I can generate the authors.c file fine on the command line, but the same command in the Makefile won't generate compilable code due to it terminating the string constant before a UTF-8 character.
That's strange, this issue should be fixed in CVS. What does it say when you do this:
This is with latest CVS, although I hadn't updated for a week so I don't know when the issue was introduced.
[hans@mirzam shell32]$ file authors.c authors.c: ISO-8859 C program text
How are your $LANG and $LC_ALL variables set?
$ file authors.c authors.c: ISO-8859 C program text
$ echo $LANG en_GB.UTF-8
$ echo $LC_ALL
$ LC_ALL=C sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS | grep Ove "Ove K\x{FFFF}ven",
$ sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS | grep Ove "Ove K",\x{FFFF}ven
-Hans
On Friday 27 February 2004 13:07, Robert Shearman wrote:
$ echo $LANG en_GB.UTF-8
$ echo $LC_ALL
$ LC_ALL=C sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS | grep Ove "Ove K\x{FFFF}ven",
$ sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS | grep Ove "Ove K",\x{FFFF}ven
Right, my orignal patch to fix this did LANG=C before the sed. Alexandre changed this to LC_ALL=C recently, probably to get C sorting order as well.
When I use your settings I can't reproduce what you get (I have sed 4.0.8 on Fedora Core 2 test 1). What does it say when you do this:
$ LANG=C sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS | grep Ove
-Hans
Hans Leidekker wrote:
On Friday 27 February 2004 13:07, Robert Shearman wrote:
$ echo $LANG en_GB.UTF-8
$ echo $LC_ALL
$ LC_ALL=C sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS
| grep Ove
"Ove K\x{FFFF}ven",
$ sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS | grep Ove "Ove K",\x{FFFF}ven
Right, my orignal patch to fix this did LANG=C before the sed. Alexandre changed this to LC_ALL=C recently, probably to get C sorting order as well.
When I use your settings I can't reproduce what you get (I have sed 4.0.8 on Fedora Core 2 test 1). What does it say when you do this:
$ LANG=C sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS | grep Ove
"Ove K\x{FFFF}ven",
Rob
-Hans
On Friday 27 February 2004 14:21, Robert Shearman wrote:
$ LC_ALL=C sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS
| grep Ove
"Ove K\x{FFFF}ven",
I'm lost as to where the \x{FFFF} comes from (is it grep?), but as such this looks like a valid C string. So it should compile if this is the literal source we're seeing. Or could it be that your terminal inserts the \x{FFFF}? Anyone else have a clue?
To exclude grep, are there any \x{FFFF}s when you do:
$ cat authors.c
-Hans
Hans Leidekker wrote:
On Friday 27 February 2004 14:21, Robert Shearman wrote:
$ LC_ALL=C sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS
| grep Ove
"Ove K\x{FFFF}ven",
I'm lost as to where the \x{FFFF} comes from (is it grep?), but as such this looks like a valid C string. So it should compile if this is the literal source we're seeing. Or could it be that your terminal inserts the \x{FFFF}? Anyone else have a clue?
It comes from pasting the lines into my mail program.
To exclude grep, are there any \x{FFFF}s when you do:
$ cat authors.c
No, there are just blanks spaces where the characters should be.
Rob
On Friday 27 February 2004 13:41, Robert Shearman wrote:
On Friday 27 February 2004 14:21, Robert Shearman wrote:
$ LC_ALL=C sed -e '1,2d' -e 's/(.*)/ "\1",/' ../../AUTHORS
| grep Ove |
"Ove K\x{FFFF}ven",
Ok, after a make distclean I was able to reproduce your problem. Somehow when called like that sed does not pick up $LC_ALL anymore. I guess there's a subshell in between somewhere? A wrapper? When you export $LC_ALL things start to work again. Attached patch does that.
Changelog: export LC_ALL before calling sed.