RE: Word, Asian characters, and Arial Unicode

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon May 07 2001 - 05:15:39 EDT


David J. Perry wrote:
> Word 2000 (under Win98) insists on using Arial Unicode MS whenever you
> insert a character in the CJK Punctuation range. There are some
> characters here that might be useful in non-CJK situations, such as
> the double brackets. I have made a font with these characters but
> Word will not let me use it, automatically formatting the characters
> in Arial Unicode MS even when the surrounding text is in my own font.
> I've tried several methods of inputting the characters but the result
> is always the same.
> Does anybody know how to handle this?

<OT Word 2000 behavior>

The main design goal behind Word 2000 apparently was that it had to be more
intelligent than the user sitting in front of it.

I often suspect that the developers even wanted their creature to be more
brilliant than themselves, as testified by the animated icon they chose to
"personify" the application: a caricature of Albert Einstein.

Probably, the Office 2000 team should have met every week in the projection
room, to watch the scene of "2001, A Space Odyssey" when the human had to
turn off HAL 9000, the on-board computer, because it started killing people
and singing silly nursery songs. :-)

</OT>

Apart this, I see one problem with your idea of using characters from the
"CJK Symbols and Punctuation" block in classical studies: most of these
character have an inappropriate "East Asian Width" property.

East Asian Width is a property that tells whether or not each Unicode
character should have the same typographical width as a CJK ideograph. The
property may be "yes", "no", or a few different kinds of "maybe".

This property is generally "yes" for JCK ideographs and other letters in CJK
writing systems (e.g. Japanese kana or Korean Hangul), and for most
*punctuation* and symbols mutuated from CJK character sets.

The property is published in
[http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt]. See the fragment
for the characters in your list:

<quote>
        3008;W;LEFT ANGLE BRACKET
        3009;W;RIGHT ANGLE BRACKET
        300A;A;LEFT DOUBLE ANGLE BRACKET
        300B;A;RIGHT DOUBLE ANGLE BRACKET
        300C;W;LEFT CORNER BRACKET
        300D;W;RIGHT CORNER BRACKET
        300E;W;LEFT WHITE CORNER BRACKET
        300F;W;RIGHT WHITE CORNER BRACKET
        [...]
        3016;W;LEFT WHITE LENTICULAR BRACKET
        3017;W;RIGHT WHITE LENTICULAR BRACKET
        [...]
        301A;A;LEFT WHITE SQUARE BRACKET
        301B;A;RIGHT WHITE SQUARE BRACKET
</quote>

The meaning of "A" and "W" types is explained in UTR#11,
[http://www.unicode.org/unicode/reports/tr11/]:

<quote>

ED4. East Asian Wide (W) - all other characters that are always wide. These
characters occur only in the context of East Asian typography where they are
wide characters (such as the Unified Han Ideographs or Squared Katakana
Symbols). This category includes characters that have explicit half-width
counterparts.

[...]

ED6. East Asian Ambiguous (A) - all characters that can be sometimes wide
and sometimes narrow. Ambiguous characters require additional information
not contained in the character code to further resolve their width.

Ambiguous characters occur in East Asian legacy character sets as wide
characters, but as narrow (i.e. normal-width) characters in non-East Asian
usage (Examples are the Greek and Cyrillic alphabet found in East Asian
character sets, but also some of the mathematical symbols). Private Use
characters are considered ambiguous, since additional information is
required to know whether they should be treated as wide or narrow.

</quote>
 
As you see, only pairs 300A-300B and 301A-301B are classified as
A(mbiguous); all others are hopelessly W(ide).

By the typographical point of view, this means that they would probably have
a width more appropriate to Chinese typography than to Western text (i.e.
probably they will be as wide as a capital "M", or even more, which is
unusual for western parentheses or quotation signs).

Also their baseline is probably designed to fit well with ideographs, and
could appear weird near alphabetic text (e.g. the position of the characters
could be too low).

By the point of view of software, the fact that they are assumed to "occur
only in the context of East Asian typography" could explain (although not
necessarily justify!) the behavior of Word 2000 to always seek a CJK font to
typeset them.

_ Marco



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:16 EDT