RE: EA width, Latin punctuation and fonts

From: Marco.Cimarosti@icl.com
Date: Mon Dec 13 1999 - 07:59:28 EST


... Update about the Chinese "dot-like hyphen" sign...

The character we were talking about is 0x2124 in GB. I could finally peek in
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/GB/GB12345.TXT (my web
connection was kaputt yesterday), and it actually maps to Unicode U+30FB
(KATAKANA MIDDLE DOT).

        # Name: GB12345-80 to Unicode table (complete, hex
format)
        # Unicode version: 1.1
        ...
        # Date: 6 December 1993
        # Author: Glenn Adams <glenn@metis.com>
        # John H. Jenkins <John_Jenkins@taligent.com>
        # Copyright (c) 1991-1994 Unicode, Inc. All Rights reserved.
        ...
        # Any comments or problems, contact
<John_Jenkins@taligent.com>
        0x2121 0x3000 # IDEOGRAPHIC SPACE
        0x2122 0x3001 # IDEOGRAPHIC COMMA
        0x2123 0x3002 # IDEOGRAPHIC FULL STOP
        0x2124 0x30FB # KATAKANA MIDDLE DOT
        ...

You can see some examples of GB 0x2124 used to separate transliterated
Western names and surnames in these articles from the Renmin Ribao
(Peking):

- http://www.peopledaily.com.cn/zdxw/18/19991213/199912131810.html
(contains several names of American and British movie directors and actors).

- http://www.peopledaily.com.cn/leader/dl/b1051.html (is a brief
presentation of the American history for the Chinese readers, and features
George·Washington himself!)

However, Microsoft made a different decision about the mapping of this
character: when I read these GB articles with Internet Explorer 5 (under Win
NT 4.0), it gets mapped to Unicode U+00B7 (MIDDLE DOT).

I also had a look at some text in Big-5. This article from the China Times
(Taiwan) shows the corresponding Big-5 character:

- http://www.chinatimes.com.tw/news/papers/online/biz/N88C1301.htm
(talks about Linux)

I was surprised to discover that it is also used as a decimal separator for
numbers (percentages), when expressed with ideographic digits.

The behavior of Internet Explorer is interesting here: the character is
mapped to Unicode U+2027 (HYPHENATION POINT), that is a third reasonable
candidate for our guy.

If I was in Peter's (or other CJK font designers') situation, I would take
the safe approach and map U+00B7, U+2027 and U+30FB all to the same glyph,
so that most Chinese text would display nicely.

However, mapping U+00B7 to a CJK glyph has the drawback that it necessarily
becomes a wide character. This is OK in many situation, but not when the
Chinese text contains occasional words in Western languages, especially in
Catalan. In fact, I would not like to be the Chinese publisher who wants to
typeset a tourist guide of Barcelona using such a font...

Ciao. Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT