RE: A basic question on encoding Latin characters

From: John Hudson (tiro@tiro.com)
Date: Fri Sep 24 1999 - 14:24:16 EDT


At 06:01 AM 24-09-99 -0700, Marco.Cimarosti@icl.com wrote:

> * Accented characters glyphs are already in fonts and, thus, they
>must have a code?
>Hey, but glyphs are glyphs and characters are characters! Font-makers should
>come up with their own glyph identification schemes. They should give an
>unique ID of their choice to each glyph, then some sort of software
>algorithm (possibly embedded in the font itself) will decide how to map one
>or more characters to one or more glyphs.
>This, too, is not a valid reason.

>But, possibly, it deserves more thinking...

>Imagine that you are a font architect (many of you don't need much
>imagination for this ;-) and that you have to come up with a glyph
>identification scheme. It is a hard work, especially if you want it to be
>nice job that needs not be done again each time your foundry starts a new
>font project.

>The glyph ID can be absolutely arbitrary, but it could be a wise choice to
>use Unicode values as glyph identifiers, whenever possible.

>Most glyphs, in fact, correspond to a single Unicode character, so it is
>handy to assign them the same numeric value: the glyph ID for "a" can be
>0x61, for "b" can be 0x62, for the Hanzi ideograph "one" 0x4E00, etc.

[snip]

>The credits page of all versions of the Unicode book, show an impressing
>number of font designers and vendors. My impression is that all these
>font-people that took part in the design of Unicode had in mind, since the
>beginning, to use Unicode also as a glyph encoding inside their fonts.

>In my mind, all those letter-accent pairs, all those ligatures, all those
>"presentation forms" for Arabic and vertical CJK, all those ideograph
>variants, etc. are there to allow font designers using Unicode as a glyph
>indexing system.

>Probably I am wrong thinking this. But, assume that I had the right
>impression, what would be wrong with it?

Please do not blame font developers for philosophical and implementational
inconsistences in Unicode. Almost no font developers are members of the
Unicode Consortium, and only recently have related standardisation projects
-- e.g. MES -- actively sought the involvement of font developers.
Character standards are something which font developers generally accept as
a combined blessing and burden, and our job is to design and map glyphs to
such standards, however easy or onerous the task may be.

In my experience, inconsistent application of the abstract character
philosophy in Unicode largely stems from incomplete and inaccurate
understanding of this philosophy on the part of national standards bodies
making recommendations to ISO 10646 or directly to Unicode. This accounts
for the inclusion of the bizarre set of Arabic Presentation Forms, which
are at once unnecessary for intelligent Arabic text processing and
insufficient for quality Arabic typography. Other problems stem from over
enthusiastic application of the abstract character philosophy by the UTC,
resulting in awkward codepoint unifications which make the task of mapping
glyphs to character encodings in fonts unnecessarily complicated. This
accounts for, among many other instances, the lowercase hooked f used in
the orthographies of many African languages being unified with the
florin/guilder currency symbol, which is semantically distinct, most often
fitted to the figure width, and almost always stylistically represented in
an oblique or script form.

The job of the font developer might be described as 'solving glyph problems
caused by character sets'.

As to glyph indexing and naming, we have a standard for this, developed by
Adobe, which may be applied to both encoded and unencoded glyphs, and which
is extensible and reflects decomposing behaviour. See

        http://partners.adobe.com/asn/developer/typeforum/unicodegn.html

John Hudson

Tiro Typeworks
Vancouver, BC
www.tiro.com
tiro@tiro.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT