Character to Glyph Ratio in Indic Scripts

From: James E. Agenbroad (jage@loc.gov)
Date: Wed Jul 07 1999 - 14:42:58 EDT


                                                             July 7, 1999
Back in 1993 I send Glenn Adams a chart comparing the number of
Unicode(tm) characters for various scripts of Indic origin with the number
of glyphs for the same scripts as found in "Specimen Book of 'Monotype'
Non-Latin Faces" (Different pages have different dates often from the
1970's.) John Cowan recently suggested I send it to this list. Since
much has happened in the interval I have instead updated it counting the
characters as found in the May 19 1999 draft version of Unicode 3.0 and
the glyphs as found in the glyph repertoires in Monotype's "WorldType
Solutions Catalogue" obtained at their exhibit at IUC 13 in 1998.

SCRIPT UNICODE MONOTYPE APPX. RATIO
                     CHARACTER GLYPH COUNT
                     COUNT

Devanagari 104 374 1 : 3.6
Bengali 89 316 1 : 3.5
Gurmukhi 75 112 1 : 1.5
Gujarati 78 371 1 : 4.9
Oriya 79 225 1 : 2.9
Tamil 62 166 1 : 2.6
Telugu 80 499 1 : 6.2
Kannada 80 241 1 : 3
Malayalam 78 404 1 : 5
Sinhala 80 387 1 : 4.8
Myanmar (Burmese) 78 226 1 : 3
Khmer 103 177 1 : 1.7

     In the case of Devanagari the glyph count includes some presentation
variants, e.g., two regional forms of 'a' and superscript digits. Similar
comparisons using the glyph repertoire of other fonts for these scripts could
confirm or alter the ratios. But it would not I think alter the overall
conclusion: software for rendering scripts of Indic origin legible must be
quite context sensitive. I sometimes wonder if standarization on an
approach to such rendering would be desirable or if there is too much
variation in fonts, output devices, etc. to make it possible. Tibetan
probably presents even more challenges, but I have no glyph repertoire for
it.

     I've been thinking about compiling a list of the Unicode character
sequences that would result in these glyphs but have nothing to present at
this time. If others have done so I'd be glad to hear about it.

     Regards,
          Jim Agenbroad ( jage@LOC.gov )
     The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Dev.Gp.4, Library
of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT