Discrepancy between Names List & Code Charts?

From: Kevin Brown (graphity@adelaide.on.net)
Date: Wed Aug 14 2002 - 05:10:04 EDT


This is my first posting to this list so please be gentle with me!

I have come across a confusing discrepancy between the "official" unicode
description of some characters (ie the description in the Names List) and
the way they are graphically displayed in the Unicode Code Charts.

This appears to have led to a lack of consistency between at least three
ubiquitous unicode fonts - Lucida Unicode, Times New Roman (OT) and Arial
MS Unicode.

As an example, the following glyphs from Latin Extended-A is displayed in
the Code Charts (online and in The Unicode Standard 3.0 book) with a
comma below, but are described as follows in the current Names List:

U+0136 LATIN CAPITAL LETTER K WITH CEDILLA
U+0137 LATIN SMALL LETTER K WITH CEDILLA

The current Adobe Glyph List names (in the last version issued v1.2
October 98) for these characters are "Kcommaaccent" and "Kcommaaccent".
These AGL names must have been been updated sometime between 1996 and
1998 because in the last version of Fontographer (v4.1.5, Oct 96) the
character names were "Kcedilla" and "kcedilla". (Likewise for L, l, N, n,
G, g, R and r)

For these characters, the Lucida Unicode font uses a cedilla, Times New
Roman (OpenType version) uses a special modified cedilla/comma character,
and Arial MS Unicode uses a comma.

Compare this with the following characters:

U+015E LATIN CAPITAL LETTER S WITH CEDILLA (AGL name: "Scedilla")
U+015F LATIN SMALL LETTER S WITH CEDILLA (AGL name "scedilla")

...even though they have the same Unicode Names List description (ie
"WITH CEDILLA") as the K and k characters above, these characters are
actually displayed with a cedilla in the Code Charts (ie not a comma as
with the K and k etc).

Furthermore, in the "recently" added Romanian Additions in Latin
Extended-B, we find

U+0218 LATIN CAPITAL LETTER S WITH COMMA BELOW (AGL name:
"Scommaaccent")
U+0219 LATIN SMALL LETTER S WITH COMMA BELOW (AGL name "scommaaccent")

...these "WITH COMMA BELOW" characters are displayed in the Code Charts
with a comma - identically to the K, k, L, l, N, n, G, g, R and r
characters described "WITH CEDILLA"

You can see from the above examples that the Adobe Glyph List name (where
it exists) is a more reliable indicator of of how the character is
displayed in the Code Charts than the "official" description in the
Unicode Names List. The trouble is there are only 1,050 characters in the
AGL compared with over 50,000 currently described in Unicode!

Can someone help me with this confusion as I am unsure how I should
structure these "WITH CEDILLA" characters in fonts I'm working on.

Am I just displaying my ignorance of European writing systems or does the
Unicode Names List and/or the Code Charts need updating???!!!

Kevin Brown

 



This archive was generated by hypermail 2.1.2 : Wed Aug 14 2002 - 03:18:52 EDT