Re: Discrepancy between Names List & Code Charts?

From: John Hudson (
Date: Thu Aug 15 2002 - 02:22:28 EDT

About the design and encoding of diacritics involving cedillas and

[Note that remarks about language use are limited to a European context.]

These glyphs are sometimes called /*cedilla/, but this is due to an
historical misinterpretation in both the Unicode standard and the original
version of the Adobe Glyph List:


These glyphs are used in a European context only for Latvian, and the
correct form of diacritic is *not* a cedilla but the same unattached
'commaaccent' form used for Romanian S and T. [Note, however that you
should not use the /comma/ glyph as a component below any of these letters:
it is much too large. You want a shorter, typically curved form, occupying
about the same height as the cedilla. The mark should be centred optically
below the letter.]

So these glyphs should actually be


but mapped to the ...WITH CEDILLA Unicode characters.

NB: the lowercase /gcommaaccent/ is almost always written with a variant
mark that actually sits above the letter (to avoid collision with the
descending loop); this is achieved by rotating the commaaccent mark 180
degrees and positioning it above the g. I usually include the variant
ingredient glyph /uni0312/ to use in the /gcommaaccent/ composite.

Regarding the /Scedilla/ and /Tcedilla/ vs. /Scommaaccent/ and /Tcommaaccent/:

/Scedilla/scedilla/ are used only for Turkish; this must be a true cedilla.

/Scommaaccent/scommaaccent/ and /Tcommaaccent/tcommaaccent/ are used only
for Romanian; this must be the same 'comma' diacritic form discussed above
for Latvian, and should *not* be attached to the letter.

/Tcedilla/tcedilla/ is not used for any European language (it is arguably
more appropriate for Gagauz Turkish than the 'comma' accent form, because
they also use the /Scedilla/, but GT texts I have seen all use the 'comma'
below the T and the cedilla below the S). Generally I do not include the
cedilla variant in fonts, and simply double map the /Tcommaaccent/ to the
Unicode values discussed below.

Version 3.0 of the Unicode standard, which postdates the published WGL4
set, disunified the /Scedilla/ and /Tcedilla/ from the /Scommaaccent/ and
/Tcommaaccent/ by providing new codepoints for the latter. My
recommendation is to use the new codepoints for /Scommaaccent/ but to
double map the /Tcommaccent/ glyph to the new codepoints and also to the
old /Tcedilla/ codepoint.

Note that there are text encoding issues regarding Romanian, because the
Romanian 8-bit codepages all use the old /Scedilla/ and /Tcedilla/ Unicode
codepoints, not the new codepoints for the 'comma' accent characters. In
OpenType fonts, we've addressed this (for future support) by including a
Language System tag for Romanian, and a Localised Forms <locl> feature
lookup to substitute the /Scommaccent/ glyph for the /Scedilla/. This
feature is not yet supported in any systems or applications, but I'm
reasonably certain that it will be.

John Hudson

Tiro Typeworks
Vancouver, BC

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it. - Terry Eagleton

This archive was generated by hypermail 2.1.2 : Thu Aug 15 2002 - 00:26:14 EDT