Re: Berber/Tifinagh (was: Swahili & Banthu)

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 10 2003 - 18:57:20 EST

  • Next message: Michael Everson: "Re: Re[2]: Berber/Tifinagh (was: Swahili & Banthu)"

    Philippe Verdy wrote:

    > You seem to forget that Tifinagh is not a unified script, but a set of
    > separate
    > scripts where the same glyphs are used with distinct semantic functions.

    I think Philippe is running off the rails here.

    Tifinagh is a script. It comes in a number of local varieties,
    adapted to different languages and with local variations in
    glyph preferences. It will be encoded as a *single* script in
    Unicode, since encoding all the local orthographic varieties
    as distinct scripts would really not be a service to anyone
    who wants this script encoded for enabling IT processing of
    Berber textual data.

    The situation for Tifinagh can be profitably compared with the
    Unified Canadian Aboriginal Syllabics, I think. (U+1401..U+1676)
    The Cree syllabics were adapted all across Canada, and in that
    case across major language family boundaries, from Algonquian
    languages into Athabaskan and Inuit languages. During that process,
    there were many extensions, reuses, adaptations, and inconsistencies
    in the symbolic usage. Rather than encode a half dozen different
    "scripts" for this, one for each local orthographic tradition, the
    entire script was carefully "unified" to enable representation of
    any of the local varieties accurately with the overall script
    encoding. I suspect that a similar approach will be required to
    finish the encoding of Tifinagh.

    > Byt itself, ignoring all other transliteration to Latin and Arabic, "the"
    > Tifinagh
    > scripts are already cyphers of another variant of Tifinagh script.

    This is a complete misuse of the term "cypher".

    There are differences in glyph usage, and there may be differences
    in character usage (depending on the solution chosen for the
    encoding) between local, historic varieties of Tifinagh. These
    differences can be mapped against each other, but such a mapping
    does not constitute a "cypher", any more than local Runic traditions
    (Nordic, Germanic, Anglo-Friesian) are ciphers of each other.

    >
    > And I think it is the major issue which requires to choose a policy for its
    > encoding. If characters are encoded by their names (as they should in
    > Unicode)

    They are not, nor should they be.

    > then we are unable to produce an accurate chart showing "representative
    > glyphs", as no variant of the script covers the whole abstract character
    > set,
    > and so this would require several charts, i.e. multiple glyphs for the same
    > abstract character.

    There is nothing new about this. As the discussion in the Unicode
    Standard makes clear, this is already the case for the Latin
    script and many other scripts. It is patently obvious for the
    Han script, among others. This is just basics of the Character-Glyph
    Model.

    > In this condition, why couldn't Latin glyphs be among
    > these,

    Because, as Doug already pointed out, Latin characters are not
    Tifinagh characters, so substituting Latin glyphs for Tifinagh
    glyphs constitutes "lying" about the identity of the characters.
    Arbitrary substitutions like this outside the context of
    glyph variation inherent to a character identified as a unit
    of a script are *not* allowed by the Unicode Standard's
    conformance clauses. It runs afoul of the basic clause regarding
    character identity and interpretation, C7.

    You can always shift textual data between scripts, of course,
    but that is a knowing transformation of characters, known
    as *transliteration*. It is *not* merely adding more glyphs
    to the allowable range of glyph variation for a character.

    > when they already have the merit of covering the whole abstract
    > character set covered by all scripts in the Tifinagh family?

    You could say the same about any script whatsoever, as I
    suspect that *every* script in Unicode has been transliterated
    into the Latin script at one point or another. Why not just
    map them *all* to Latin and save the messy task of having to
    deal with data represented in its own script? (<== That was
    a rhetorical question, in case it wasn't obvious to all readers.)

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 19:39:43 EST