... Update about the Chinese "dot-like hyphen" sign...
The character we were talking about is 0x2124 in GB. I could finally peek in
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/GB/GB12345.TXT (my web
connection was kaputt yesterday), and it actually maps to Unicode U+30FB
(KATAKANA MIDDLE DOT).
# Name: GB12345-80 to Unicode table (complete, hex
format)
# Unicode version: 1.1
...
# Date: 6 December 1993
# Author: Glenn Adams <glenn@metis.com>
# John H. Jenkins <John_Jenkins@taligent.com>
# Copyright (c) 1991-1994 Unicode, Inc. All Rights reserved.
...
# Any comments or problems, contact
<John_Jenkins@taligent.com>
0x2121 0x3000 # IDEOGRAPHIC SPACE
0x2122 0x3001 # IDEOGRAPHIC COMMA
0x2123 0x3002 # IDEOGRAPHIC FULL STOP
0x2124 0x30FB # KATAKANA MIDDLE DOT
...
You can see some examples of GB 0x2124 used to separate transliterated
Western names and surnames in these articles from the Renmin Ribao
(Peking):
- http://www.peopledaily.com.cn/zdxw/18/19991213/199912131810.html
(contains several names of American and British movie directors and actors).
- http://www.peopledaily.com.cn/leader/dl/b1051.html (is a brief
presentation of the American history for the Chinese readers, and features
George·Washington himself!)
However, Microsoft made a different decision about the mapping of this
character: when I read these GB articles with Internet Explorer 5 (under Win
NT 4.0), it gets mapped to Unicode U+00B7 (MIDDLE DOT).
I also had a look at some text in Big-5. This article from the China Times
(Taiwan) shows the corresponding Big-5 character:
- http://www.chinatimes.com.tw/news/papers/online/biz/N88C1301.htm
(talks about Linux)
I was surprised to discover that it is also used as a decimal separator for
numbers (percentages), when expressed with ideographic digits.
The behavior of Internet Explorer is interesting here: the character is
mapped to Unicode U+2027 (HYPHENATION POINT), that is a third reasonable
candidate for our guy.
If I was in Peter's (or other CJK font designers') situation, I would take
the safe approach and map U+00B7, U+2027 and U+30FB all to the same glyph,
so that most Chinese text would display nicely.
However, mapping U+00B7 to a CJK glyph has the drawback that it necessarily
becomes a wide character. This is OK in many situation, but not when the
Chinese text contains occasional words in Western languages, especially in
Catalan. In fact, I would not like to be the Chinese publisher who wants to
typeset a tourist guide of Barcelona using such a font...
Ciao. Marco
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT