Hi,
On Redhat 9 I get errors like this one when doing a 'make htmlpages':
Malformed UTF-8 character (unexpected non-continuation byte 0x6e, immediately after start byte 0xfc) in substitution (s///) at ../../tools/c2man.pl line 313, <SOURCE_FILE> line 2.
This is because I have LANG="en_US.UTF-8" as part of my environment, and perl will now switch to character semantics (as opposed to byte semantics) when it detects a Unicode character set. Wine source files contain characters with ordinals > 127 (it looks like the Wine sources are ISO_8859-1) and of course, these usually don't also form valid UTF-8 sequences.
Off hand I see three solutions (in order of increasing acceptability):
1. Convert Wine source files to ASCII-7 or UTF-8 2. Set character set to "C" or "ISO8859-1" prior to running perl on the sources 3. Force perl back into using byte semantics
1. Most non-ASCII-7 characters are in C comments (in the names of authors, e.g. Ove Kåven). But there are files like dlls/x11drv/keyboard.c that contain them as part of a C string. Going this way would mean these characters would have to be escaped.
Of course this is a step backwards. It would degrade readability of the sources and probably offend some awkwardly named authors ;^) It would also require a change of pratice, which is hard to accomplish.
Converting to UTF-8 seems more promising. C strings still need to be escaped but then our Hungarian, authors can finally have their names spelled properly in the sources! Still, there are more programs that have to interpret C source files and I estimate that most of them do not yet handle UTF-8 properly (though vi and emacs are amongst the capable).
2. Changing the character set beforehand will get rid of the errors and is much less controversial than the above solution ;) But it's still cumbersome to have to do so.
3. This will work regardless of the character set specified in the user's environment. The attached patch does this for c2man.pl
Bye,
-Hans
Changelog: Force perl to use byte semantics