EA width, Latin punctuation and fonts

From: peter_constable@sil.org
Date: Thu Dec 09 1999 - 10:56:37 EST

       We're working on finishing up a Yi font, and there's something
       I'm uncertain about since I have no background whatsoever in
       dealing with East Asian computing issues. I figure the answer
       should be the same as if Han characters are involved (so I
       figure that many of you will have ideas, even if you're not
       that familiar with Yi).

       I've got a couple of books in my lap that contain Yi text, and
       there is a small assortment of Latin punctuation characters
       that are used: , ( ) ! : ; as well as em dash and open/close
       quotation marks. There is also an issue with the space
       character. The problem is this: when used with Yi characters,
       these need to be displayed using wide glyphs, i.e. same advance
       width as the Yi glyphs. However, the users want to be able to
       also create (proportional width) Latin text with the same font,
       e.g. if writing in English about Yi words or for viewing
       marked-up text in a one-font-for-all text editor. So, our font
       needs to contain both wide and narrow glyphs for these

       The question is: What is the ideal solution? What character
       codes get used in text, and how do they get mapped to the
       appropriate glyphs?

       The quotation marks aren't a problem, I think, since the CJK
       Symbols and Punctuation block contains U+301D and U+301E, which
       can be used for the wide quotation marks - these aren't
       compatibility characters; this will be in line with the use of
       U+3001 and U+3002, which also get used with Yi. Also, the em
       dash isn't a problem since, by definition, it is full width.

       For space and the other punctuation characters, Unicode
       contains U+3000, U+FF01, U+FF08, etc., so I could simply access
       the wide glyphs from these values in the cmap. But these are
       all compatibility characters, so it seems that the preferred
       encoding of text will use only U+0020, U+0021, U+0028 etc.
       regardless of the desired width for the glyphs.

       It seems the solution must be one of the following:

       1. Include both wide and narrow glyphs in a single font and
       encode text using U+3000 etc. (i.e. encode using compatibility

       This seems to have a problem in that, if text is normalised,
       that can affect the layout/presentation of a document, and it
       seems to me that shouldn't happen. (Am I wrong to think that's
       a problem?)

       2. Do not mix wide and narrow glyphs in a font, and encode text
       without any compatibility characters. Get the desired width for
       glyphs by formatting text using appropriate fonts.

       Seems like a valid solution, but unnecessarily limiting if it's
       the only option.

       3. Mix wide and narrow glyphs in a font, and encode text
       without any compatibility characters. Design applications such
       that a character such as U+0020 is encountered within a run
       that is tagged for an East Asian language (or within a run of
       unambiguous wide characters), transduce this to U+3000 (or
       whatever the full-width compatibility character is for the
       character in question) before calling TextOut, or do something
       similar (perhaps handled in the OS) so that the wide glyph is

       This seems like it would be open to problems; e.g. what happens
       if the selected font has only wide glyphs accessed from the
       cmap via non-compatibility Unicode values such as U+0020? This
       just doesn't seem like something we want software developers to
       mess with.

       4. Encode text without any compatibility characters, permit
       mixing wide and narrow glyphs in a font and use smart font
       technology. The font developer can choose to include a
       feature-selected substitution - if this feature is on,
       substitute this default narrow glyph with this other wide
       glyph. The feature might be the language of the given run (need
       a LangID for Yi, not currently supported in Win32 or in ISO
       639!), which is supported by OpenType (but, I believe, not
       Uniscribe; also not currently supported in AAT/ATSUI), or some
       other arbitrary label (supported by AAT/ATSUI). Of course, the
       run of text must be tagged for this feature.

       I think this is the solution we want to be working toward.

       Am I right in thinking that option 4 is ideally the preferred
       choice? How is this issue currently being handled for other
       East Asian writing systems? What solutions are developers
       working toward?

       If option 4 is the goal but can't yet be implemented, is option
       1 (as well as option 2) a reasonable solution for the interim?


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT