Re: Tifinagh - extension for complete common Berber alphabet isomorphic with Latin

Date: Wed Feb 17 2010 - 11:39:17 CST

  • Next message: Frédéric Grosshans: "Re: Encoding unattested characters (was: Tifinagh - extension for complete common Berber alphabet isomorphic with Latin)"

    Hi Robert,

    The suggestions for orthographic reform in Latin script are indeed out
    of scope for Unicode. They are rather to do with keyboard layouts and
    tools. I included them in the background to give an idea of how the
    Berber languages are typed in different areas and what the difficulties
    and variations are.

    > - You are correct that the controversial part is to aim for a unified
    > > reference alphabet, leaving differences to the fonts, and having separate
    > > specific code points for particular historical/regional variant letters
    > > like the Berber academy forms where they need to be shown explicitly.
    > > (I live in Algeria in Kabylie and actually prefer the Berber Academy
    > > glyphs!)
    >>This is indeed controversial, as it goes against the principles of
    >>Unicode. Unicode encodes scripts, yet you propose to encode *sounds*
    >>>>that would then be, depending on font, mapped to *different* characters,
    >>>>*not glyphs*, which for instance is the case with the CJKV range.

    I don't think it does go against the principles necessarily. Some of
    the consonant symbols have come to represent different sounds in different
    regions because of the vast geographical distances involved, sound change
    over long timescales, and disagreeing committees. But allowing for that,
    there is great underlying unity in the script.

    The original encoders of Tifinagh in Unicode recognised this unity and
    saw that all the variant repertoires can be aligned on the basis of
    sound. (In fact, it is surprising just how many of the letters represent
    the same sound or very close sounds everywhere).
    - The IRCAM letters have been named "Tifinagh letter X" not "IRCAM letter
    X". This set forms the basis of all the other, variant repertoires.
    - All the existing Berber Academy and Tuareg variant forms explicitly
    encoded are named in Unicode according to their sound, with the note
    that a couple of them represent a different sound in other Tuareg variants.
    So my approach has been (incompletely) applied already.

    >>Again, those are only different *glyphs* for the *same character* as
    >>>>as encoding and fonts are concerned.
    >>For instance, consider a world where all currency symbols were actually
    >>>>the same code point, namely ¤ U+00A4 CURRENCY SIGN. Now any monetary
    >>>>figure would be displayed with a different currency sign when different
    >>>>fonts are used and it would lead to disaster (compare JPY 100 to USD
    >>>>100). This was basically what happened with ANSI encoding. Latin
    >>characters would turn into Greek or Russian simply because text was
    >>>>viewed on another PC that had another locale. Because of this and other
    >>>>pragmatic reasons (e.g., the internet and need to have different scripts
    >>>>side-by-side), Unicode was born. So your current proposal would be
    >>>>real step back. Just look at the mess with the Yen sign even today
    >>>>Unicode because of legacy support! ¥ for Japanese OS, reverse solidus
    >>>>for everybody else... because Fonts map that way!
    >>IMO the correct way of handling what you're proposing would be to have
    >>>>different keyboard layouts that map same sounds to same keys. Therefore,
    >>>>user A accustomed to keyboard for one script can simply change layouts
    >>>>and type the same sounds. This would obviously only work if your
    >>proposed system seemed natural to the casual user and there is no
    >>constant need to type in mixed scripts.
    >>Creating a whole new alphabet that would then map (via keyboard layout
    >>>>or IME for mixed texts) is overshooting the mark in my opinion, as
    >>>>would require the casual user to learn another "unified reference
    >>alphabet" just to type into a PC.

    - I have not seen any case where people typed in mixed Tifinagh scripts.
    Even if they did, for academic work they could use an academic keyboard
    that generated the specific regional forms explicitly, not the common
    alphabet. For everyday writing they could just change font. You'd never
    need to change script flavours within a word.

    - An extended form of the existing IRCAM keyboard with the added vowels
    would be an obvious and intuitive one for Tuareg and other variants.
    An Arabic keyboard mapping also suggests itself.

    - No 'whole new alphabet' would be needed because it is already there
    - the IRCAM letters form the base except for two missing vowels, and
    are even called plain 'Tifinagh letter X' in the standard.
    Even using a reference font with IRCAM glyphs for a different language
    variant would show a text readably to a user.
    A local keyboard would show the local forms on the keytops.
    A common encoding would be much more practical than having different
    keyboards for every variant that do exactly the same thing, but are
    forced to do it in different ways everywhere. (Encoding every variant
    letter and creating different tools for each area's repertoire).

    - It's not the same as with a currency symbol - a 'sh' typed in Niger-style
    script has the same intention as a 'sh' typed in Morocco or Algeria
    and it wouldn't matter for the meaning if the letter were rendered in
    different ways.

    >>I think you should propose the two missing characters that you have
    >>>>proof of use and should work out a way to logically input these
    >>different scripts by common and established means, so your research
    >>>>bear fruit and help boost literacy and cross-border awareness for Berber

    The two letters are a good start. 'Common and established means' raises
    a smile in the Berber context though. It's usually a disfavoured language
    group in its own communities, and Tifinagh is just one of several scripts
    used to write it.
    I'd prefer to aim for a practical solution rather than continuing the
    fragmentation that has plagued efforts so far.. if my solution proves
    to be a good one for Unicode.

    This archive was generated by hypermail 2.1.5 : Wed Feb 17 2010 - 11:42:54 CST