Re: Latin g with cedilla above

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Oct 01 1998 - 19:47:40 EDT


Michael Everson responded:

>
> Ar 10:09 -0700 1998-10-01, scríobh John Cowan:
> >Donald Page wrote:
> >
> >> Hi, ISO8859-4 defines character 0xBB to be Latin small letter g with
> >> cedilla above.

First of all, to correct Donald's original observation, ISO 8859-4
defines character 0xBB (11/11) to be:

LATIN SMALL LETTER G WITH CEDILLA

It shows a badly drawn glyph in the chart for what is intended to be
the preferred Latvian form, a g with a turned comma above it.

> >> This character does not appear, to me, to be in Unicode
> >> anywhere. The nearest I can find is Latin small letter g with cedilla
> >> (below) at 0x123.
> >
> >That is the normative mapping.

This statement by John Cowan is entirely correct.

8859-4 0xBB = 10646/Unicode U+0123. That equivalence is printed right in
the 8859-4 standard itself.
 
> > The Unicode Standard says that
> >U+0123 LATIN SMALL LETTER G WITH CEDILLA is Latvian and Lappish
> >(Saami, presumably), and that there are three glyph variants, but
> >doesn't say what they are.

It you want to see them, go to the Unicode Standard, Version 1.0, page
180, where they are printed larger than life:

g with turned comma above (the preferred form)
g with cedilla below
g with acute above

For technical reasons, the Unicode Standard, Version 2.0 stopped printing
multiple glyphs in a single code cell.

>
> The only one the Latvians want to see is the one with the turned comma
> above. The other glyph variants (caron and acute if memory serves) are seen
> by Latvians as errors.

Memory did not serve, but I concur that the second two glyph forms would
be seen by Latvians as errors. It would be better for Unicode 2.0 to show
the preferred Latvian glyph, as that is the most commonly occurring.

>
> As is the note in the Unicode Standard. And the normative mapping should be
> to COMBINING COMMA BELOW. Never mind the bloody name.

Not *mapping*, but *decomposition*. And in this case, as for all of
these hook-to-the-left-below's for Latin letters, there is really
no clear, hard and fast distinction between the cedilla and the comma
below. At this point, it my opinion, it would be *more* pernicious
to change the decomposition for this character than to simply document
the fact that this is one of the collection of cedillacomma below
Latin characters that have multiple glyph shapes. Come on, folks,
this is easy compared to Indic scripts.

>
> Wearily,
>
> --
> Michael Everson, Everson Gunn Teoranta ** http://www.indigo.ie/egt

John Cowan responded:

> Michael Everson wrote:
>
> > The only one the Latvians want to see is the one with the turned comma
> > above. The other glyph variants (caron and acute if memory serves) are seen
> > by Latvians as errors.
> >
> > As is the note in the Unicode Standard. And the normative mapping should be
> > to COMBINING COMMA BELOW. Never mind the bloody name.
>
> Or rather COMBINING TURNED COMMA ABOVE, I presume.
>

Absolutely not. Decompositions are done for the *characters*, not for
the *glyphs*. And for Latin uppercase/lowercase pairs, it is an
ironclad rule that a canonical decompositions must involve the same sequence
of combining marks. This is because a casing operation which operates
on the baseform letter of the combining character sequence should produce
the correct results for the entire sequence. You cannot produce a
decomposition for U+0123 based on its glyph appearance, independently of
consideration of its uppercase form U+0122.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT