On Tue, 1 Nov 2016 11:59:30 +0000, Hugh McMaster wrote:
- config->cell_width = tm.tmMaxCharWidth;
- if (tm.tmPitchAndFamily & (TMPF_VECTOR | TMPF_TRUETYPE))
config->cell_width = tm.tmAveCharWidth;
- else
config->cell_width = tm.tmMaxCharWidth;
Hi Hugh,
Why don't you use tmAveCharWidth value for raster (bitmap) fonts? Japanese fixed-pitch bitmap font, such as FixedSys (jvgafix.fon) or Terminal (not avaialble in wine), has a half value of tmMaxCharWidth for tmAveCharWidth. In other words, FixedSys shows tmAveCharWidth = 8, tmMaxCharWidth = 16. This works good in DBCS console window because full-width character (e.g. Kanji) occupies two cells.
Thanks, Akihiro Sagawa
Akihiro Sagawa sagawa.aki@gmail.com writes:
On Tue, 1 Nov 2016 11:59:30 +0000, Hugh McMaster wrote:
- config->cell_width = tm.tmMaxCharWidth;
- if (tm.tmPitchAndFamily & (TMPF_VECTOR | TMPF_TRUETYPE))
config->cell_width = tm.tmAveCharWidth;
- else
config->cell_width = tm.tmMaxCharWidth;
Hi Hugh,
Why don't you use tmAveCharWidth value for raster (bitmap) fonts? Japanese fixed-pitch bitmap font, such as FixedSys (jvgafix.fon) or Terminal (not avaialble in wine), has a half value of tmMaxCharWidth for tmAveCharWidth. In other words, FixedSys shows tmAveCharWidth = 8, tmMaxCharWidth = 16. This works good in DBCS console window because full-width character (e.g. Kanji) occupies two cells.
That's how it's supposed to work, but that's not how it works at the moment. We display one char per cell, so if you use tmAveCharWidth, Kanji overlap each other.
On Thursday, 3 November 2016 12:20 AM, Akihiro Sagawa wrote:
On Tue, 1 Nov 2016 11:59:30 +0000, Hugh McMaster wrote:
- config->cell_width = tm.tmMaxCharWidth; + if (tm.tmPitchAndFamily & (TMPF_VECTOR | TMPF_TRUETYPE)) + config->cell_width = tm.tmAveCharWidth; + else + config->cell_width = tm.tmMaxCharWidth;
Why don't you use tmAveCharWidth value for raster (bitmap) fonts? Japanese fixed-pitch bitmap font, such as FixedSys (jvgafix.fon) or Terminal (not avaialble in wine), has a half value of tmMaxCharWidth for tmAveCharWidth. In other words, FixedSys shows tmAveCharWidth = 8, tmMaxCharWidth = 16. This works good in DBCS console window because full-width character (e.g. Kanji) occupies two cells.
Hi Akihiro,
On English-language and (most likely) Western European versions of Windows, the fixed-pitch raster font (Terminal) is rendered with tmMaxCharWidth. Fixed-pitch TrueType fonts are rendered with tmAveCharWidth.
I have verified this by writing tests using Terminal, Consolas, Lucida Console and Ubuntu Mono to output font width and height for multiple font sizes. I then compared the output to the information displayed in cmd.exe's font dialog. In each case, they are the same.
The patch I sent is correct, but as you pointed out, it causes problems with Kanji and similar DBCS fonts. Unfortunately, Alexandre's changes before code freeze have caused major problems.
Do you know if there is a way to detect Kanji and similar fonts, so we can set the cell width appropriately? Obviously, it would be best to fix wineconsole's DBCS support, but until then, we might be able to find a workaround that handles character width for all languages.
Hugh
On Thu, 3 Nov 2016 08:37:45 +0000, Hugh McMaster wrote:
Do you know if there is a way to detect Kanji and similar fonts, so we can set the cell width appropriately? Obviously, it would be best to fix wineconsole's DBCS support, but until then, we might be able to find a workaround that handles character width for all languages.
I have questions. How DBCS characters is rendered with our implementation? Do they use double columns, or single column? AFAIK, on native Western console, all characters are rendered in a single column and Kanji is shown as a white square box. However, in Japanese locale, Kanji uses two columns (as described in the previous mail).
IMHO, wineconsole should have two mode: SBCS mode and DBCS one. The mode depends on console codepage. As a workaround for Western font issue, I suggest wineconsole uses tmAvgCharWidth for cell width and ignores (or replaces) full-width characters by using GetCharacterType and C3_FULLWIDTH *only* for SBCS locales. So, this will be the first step of SBCS mode.
What do you think about my idea?
Regards, Akihiro Sagawa
On Friday, 4 November 2016 3:00 AM, Akihiro Sagawa wrote:
I have questions. How DBCS characters is rendered with our implementation? Do they use double columns, or single column? AFAIK, on native Western console, all characters are rendered in a single column and Kanji is shown as a white square box. However, in Japanese locale, Kanji uses two columns (as described in the previous mail).
IMHO, wineconsole should have two mode: SBCS mode and DBCS one. The mode depends on console codepage. As a workaround for Western font issue, I suggest wineconsole uses tmAvgCharWidth for cell width and ignores (or replaces) full-width characters by using GetCharacterType and C3_FULLWIDTH *only* for SBCS locales. So, this will be the first step of SBCS mode.
Sorry for the delayed reply.
Wineconsole currently renders a standard grid. The default configuration is 80 single columns with 25 rows. One character is rendered per column. Kanji is shown as either a white box or a question mark, depending on the codepage and the font used.
I agree that wineconsole should have two modes, but I'm not sure we need special handling for full width characters in Western locales. Actually, I'm interested to find out how many full-width characters are in SBCS locales.
How can we know whether a DBCS character needs one or two cells for rendering?
We may also want to consider dropping support for the (n)curses backend as part of a broader wineconsole update.
On Sun, 6 Nov 2016 22:56:17 +0000, Hugh McMaster wrote:
I agree that wineconsole should have two modes, but I'm not sure we need special handling for full width characters in Western locales. Actually, I'm interested to find out how many full-width characters are in SBCS locales.
How can we know whether a DBCS character needs one or two cells for rendering?
SBCS mode: My test shows even Kanji characters use a one cell in SBCS console buffer. Thus, I guess no character occupies two cells. However, Kanji glyph has twice width of ASCII character. So, we need to consider them.
DBCS mode: ASCII characters and half-width Katakana characters need one cell buffer. Others, such as Kanji, need two cell buffers. Please note some symbols in ISO-8859-1 (e.g. multiplication sign) use two cell buffers in DBCS mode. There is a font glyph issue, too. For instance, fixed-pitch Western font is designed multiplication sign as the same width as ASCII character (as seen in ISO-8859-1). But, it should have as the same width as Kanji for DBCS mode. It could be possible to reject Western-style font for DBCS console.
We may also want to consider dropping support for the (n)curses backend as part of a broader wineconsole update.
Using ncursesw is another option though it requires wchar_t...
On Tuesday, 8 November 2016 2:25 AM, Akihiro Sagawa wrote:
On Sun, 6 Nov 2016 22:56:17 +0000, Hugh McMaster wrote:
I agree that wineconsole should have two modes, but I'm not sure we need special handling for full width characters in Western locales. Actually, I'm interested to find out how many full-width characters are in SBCS locales.
How can we know whether a DBCS character needs one or two cells for rendering?
SBCS mode: My test shows even Kanji characters use a one cell in SBCS console buffer. Thus, I guess no character occupies two cells. However, Kanji glyph has twice width of ASCII character. So, we need to consider them.
I did some testing on my Windows system with the locale set to Japanese. The keyboard IME gave me Hiragana, full-width Katakana and alphanumeric, and half-width Katakana and alphanumeric.
I couldn't type much in Japanese. Most of the keys gave me alphanumeric characters. When using the full-width character set, each character (including alphanumeric) had a lot of space surrounding it. Half-width characters appeared normally.
DBCS mode: ASCII characters and half-width Katakana characters need one cell
> buffer. Others, such as Kanji, need two cell buffers. Please note some
symbols in ISO-8859-1 (e.g. multiplication sign) use two cell buffers in DBCS mode.
Language question: can half-width and full-width characters appear together in the same sentence? I'm wondering how the console determines the width required? Does it use some kind of WINAPI call for the character set?
There is a font glyph issue, too. For instance, fixed-pitch Western font is designed multiplication sign as the same width as ASCII character (as seen in ISO-8859-1). But, it should have as the same width as Kanji for DBCS mode. It could be possible to reject Western-style font for DBCS console.
That seems fair. IIRC, @-prefixed Asian fonts are not allowed in the SBCS Windows console.
On Wed, 9 Nov 2016 11:26:05 +0000, Hugh McMaster wrote:
I couldn't type much in Japanese. Most of the keys gave me alphanumeric characters. When using the full-width character set, each character (including alphanumeric) had a lot of space surrounding it. Half-width characters appeared normally.
Yes, half-width characters and full-width characters coexist in Japanese character set.
Language question: can half-width and full-width characters appear together in the same sentence? I'm wondering how the console determines the width required? Does it use some kind of WINAPI call for the character set?
In computing, my answer is yes. Though this isn't a strict rule, most common usage is half-width alphanumeric and full-width katakana as seen in po/ja.po. Do you know Unicode Standard Annex #11, East Asian Width [1]? According to the document, Unicode character is classified into six: Ambiguous, Fullwidth, Halfwidth, Narrow, Wide, or Neutral (= Not East Asian). I'm not sure how native console determines the width required, but it seems that Ambiguous, Fullwidth and Wide characters require two cell buffers in Japanese locale on Windows 7. Unfortunately, there is no APIs to get this East Asian Width classification directly. However, GetStringType (C3_HALFWIDTH or C3_FULLWIDTH) partially helps us (again).
[1] http://www.unicode.org/reports/tr11/
There is a font glyph issue, too. For instance, fixed-pitch Western font is designed multiplication sign as the same width as ASCII character (as seen in ISO-8859-1). But, it should have as the same width as Kanji for DBCS mode. It could be possible to reject Western-style font for DBCS console.
That seems fair. IIRC, @-prefixed Asian fonts are not allowed in the SBCS Windows console.
@-prefixed font is for vertical writing. That makes sense.
On Friday, 11 November 2016 3:07 AM, Akihiro Sagawa wrote:
On Wed, 9 Nov 2016 11:26:05 +0000, Hugh McMaster wrote:
I couldn't type much in Japanese. Most of the keys gave me alphanumeric characters. When using the full-width character set, each character (including alphanumeric) had
a lot of space surrounding it. Half-width characters appeared normally. Yes, half-width characters and full-width characters coexist in Japanese character set.
Language question: can half-width and full-width characters appear together in the same sentence? I'm wondering how the console determines the width required? Does it use some kind of WINAPI call for the character set?
In computing, my answer is yes. Though this isn't a strict rule, most common usage is half-width alphanumeric and full-width katakana as seen in po/ja.po. Do you know Unicode Standard Annex #11, East Asian Width [1]? According to the document, Unicode character is classified into six: Ambiguous, Fullwidth, Halfwidth, Narrow, Wide, or Neutral (= Not East Asian). I'm not sure how native console determines the width required, but it seems that Ambiguous, Fullwidth and Wide characters require two cell buffers in Japanese locale on Windows 7. Unfortunately, there is no APIs to get this East Asian Width classification directly. However, GetStringType (C3_HALFWIDTH or C3_FULLWIDTH) partially helps us (again).
Thanks for adding that link to the Unicode annex. It was really useful.
I've begun looking into functions to determine character width. wcwidth() and wcswidth() (not in ISO C) are worth looking at.
StackOverflow [1] has some useful information, including an old C-based implementation of wcwidth and friends. It's worth testing on mixed width strings to see how useful it is. We may be able to update it to use the latest Unicode standard.
[1] http://stackoverflow.com/questions/3634627/how-to-know-the-preferred-display...