RE: Some Char. to Glyph Statistics, Pan/Single Font

From: James E. Agenbroad (
Date: Thu May 31 2001 - 15:56:42 EDT

                                               Thursday, May 31, 2001
My goal was never to give a specific number of glyphs needed to display a
particular Indian or other script. As others have pointed out, this
depends among other things, on the particular display device and its font
processing software possibly including the operating system. My goals
were to point out that Arabic and South and Southeast Asian scripts require:
1. Many more glyphs than character codes and, 2. As important, software to
render character codes legibly from the available glyphs. Discussions of a
single Unicode font that do not mention such software seem pointless, or
worse, managers might believe them. I wonder it we could usefully define
levels of legibility for displaying a language or writing system, or is it
too subjective? Is evoking a lam alef ligature when alef follows a lam the
minimal level for any language using Arabic script? For languages using
Devanagari script is transposing the short i matra (U+093F) to precede the
consonant(s) it follows the minimum?
          Jim Agenbroad (disclaimer and address at bottom)
 On Thu, 31 May 2001, Marco Cimarosti wrote:

> Mike Meir wrote:
> > The problem with your glyph statistics is that they are based
> > on mould counts employed by the Monotype hot metal typesetters.
> I agree: no one will ever come up with *the* correct count.
> Such general evaluations simply depend on too many things to be useful.
> E.g.: which language(s) are targeted, what degree of typographic excellence
> is required, and (as Mike explained very well) the kind of technology
> involved and its limitations.
> The simple fact that software fonts can overlay glyphs can cause a great
> factor of reduction, compared to lead type. Similarly, the fact that a
> software font technology has the capability of kerning glyphs vertically can
> reduce dramatically the inventory of glyphs needed for certain scripts.
> Moreover, different technologies may have totally different meanings for the
> word "glyph". E.g., I have heard of Arabic fonts that analyze the Arabic
> script well under the level of a "grapheme": segments of lines and
> individual dots were stored separately and assembled at display time.
> Comparing the number of glyphs in such an a font with the inventory of a
> more traditional font is what we call "sum up apples and pears".
> > Turning to Devanagari, our researches indicate that the total
> > number of script units (In Unicode terms, combinations of
> > consonants, halants, vowel signs and other signs), excluding
> > the Unicode characters in the range 0951 to 0954, in use is
> > around the 5550 mark. It is actually greater than this, since
> > there are a number of characters relating to Sanskrit sandhi
> > for which we do not have any conjunct-vowel statistics.
> As an opposite example for Devanagari, I did a little research on my own on
> a "minimal rendering scheme" for Unicode Indic scripts. The scenario behind
> this evaluation was low-resolution displays or printers and simple bitmapped
> fonts.
> For Devanagari's 77 characters (non-decomposable L& and M& characters) my
> set of glyphs was just 82 pieces. Of course, such a ratio (about 1:1.06)
> requires dropping any typographical gracefulness: of all the complexity of
> Devanagari, just a handful of half-consonants and ligatures was preserved.
> Neither your "5550" nor my "82" are of much use to anyone who has even
> slightly different requirements. However, the contrast between these two
> figures perhaps says something about the difficulty of such a count.
> _ Marco

          Jim Agenbroad ( )
     The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Dev.Gp.4, Library
of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A.

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT