Re: Merging combining classes

From: Jim Allan (jallan@smrtytrek.com)
Date: Thu Nov 06 2003 - 19:52:23 EST

  • Next message: Doug Ewell: "Re: Hebrew composition model, with cantillation marks"

    António Martins-Tválkin wrote:

    > Anyway -- who ever decided that cedilla and undercomma are different
    > things? Do they have different origins? Any language / orthography using
    > both distinctly?...

    I don't know whether undercomma is in origin distinct from cedilla or is
    historically an adaptation of the cedilla. I *suspect* the latter.

    Even given a common origins, it is debatable whether they should now be
    considered the same or not. That is why there is a problem. It isn't cut
    and dried.

    The MARC 21 and Ansel character sets distinguished the two as CEDILLA
    and LEFT HOOK (for the undercomma) though it is dubious whether the
    originators of these sets knew what this "left hook" was. See
    http://lcweb2.loc.gov/cocoon/codetables/45.html for current ANSEL
    specifications and
    http://www.niso.org/standards/resources/Z39-47-1993(R2002).pdf for 1963
    table where it was notoriously given the name "LEFT HOOF".

    Its identity with the undercomma is asserted at
    http://www.niso.org/international/SC4/Wg1_240.pdf:

    <<
    5/2 HOOK TO LEFT
    In ISO 5426, this character is annotated ' used in Latvian, Romanian.'
    Because of this use, the most appropriate mapping is to U+0326 COMBINING
    COMMA BELOW (annotated as 'variant of the following' [combining cedilla]
    in the Unicode Standard).
    >>

    The original ISO 6429 character sets were constructed under the
    philosophy that differences between cedilla and undercomma were only
    stylistic. The default images in those tables and in Unicode Standard
    versions 1 and 2 showed a cedilla form throughout.

    However users of Latvian and Romanian insisted firmly that cedilla forms
    were not historically correct for printed material in those languages.
    It was *only* increasing use of fonts created outside of eastern Europe
    that had caused the incorrect cedilla shape to be seen, especially as
    computer technology took hold.

    For Latvian (and Livonian), the problem was easily solved within
    standard character sets by font designers using the undercomma character
    beneath all letters except _c_ or _s_ .

    However Romanian _s_ which traditionally had undercomma conflicted with
    Turkish _s_ with cedilla.

    The result was a Romanian proposal to add uppercase and lowercase
    combined characters with undercomma for uppercase and lowercase _s_ and _t_.

    See ISO/IEC JTC 1/SC 2/WG 2 N1604 (1987) at
    http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1604.htm :

    <<
    *RESOLUTION M33.24 (4 Latin characters):

    _Netherland Negative._*

    WG 2 accepts the following four Latin characters (requested by Romania),
    their names and shapes to be encoded in the BMP as follows:

            0218 LATIN CAPITAL LETTER S WITH COMMA BELOW

            0219 LATIN SMALL LETTER S WITH COMMA BELOW

            021A LATIN CAPITAL LETTER T WITH COMMA BELOW

            021B LATIN SMALL LETTER T WITH COMMA BELOW

    in accordance with document N1361.

    See resolution M33.26 for further processing.
    >>

    But Romanians are still frustrated because most fonts distributed as
    part of computer operating systems or otherwise available do not support
    these characters.

    ISO 8859/16 (intended as a replacement for ISO 8859/2) specifically
    designates undercomma rather than cedilla with _s_, _S_, _t_, _T_. See
    ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-16.TXT

    For the Netherlands opposition see
    http://wwwold.dkuug.dk/JTC1/SC2/WG3/docs/n441.pdf .

    Since there is no linguistic tradition in any language for _t_ with a
    cedilla shape beneath, most modern fonts display an undercomma beneath
    U+0162, U+0163 instead of a cedilla shape.

    It is really only with _s_ that there are two conflicting usages.

    There are actually three conflicting uses, since Gagauz traditionally
    uses a cedilla shape under _c_ an undercomma beneath _t_ and a symbol
    halfway between the two under _s_. See
    http://www.unicode.org/mail-arch/unicode-ml/y2002-m09/0199.html

    Jim Allan



    This archive was generated by hypermail 2.1.5 : Thu Nov 06 2003 - 21:20:09 EST