Taiwan Aboriginal Languages and Unicode support

From: Arne Götje (高盛華) (arne@linux.org.tw)
Date: Mon Dec 25 2006 - 21:25:48 CST

  • Next message: Doug Ewell: "Re: Taiwan Aboriginal Languages and Unicode support"

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Hi list,

    I just returned from a trip to visit some of the local Taiwan aboriginal
    tribes to evaluate the alphabets they use and whether or not Unicode
    already has support for all of the characters or not and how to input
    them. In the current situation they can neither type nor display the
    characters correctly, which leads to some crude '^i' or '`d' and such.

    So far, we have collected the characters used in the Amis and Paiwan
    languages. We will visit the other tribes too to gather information from
    them in the near future.

    The full character lists are here:
     * Amis: http://www.enricozini.org/2006/amis-character-list.html
     * Paiwan: http://www.enricozini.org/2006/paiwan-character-list.html

    We found two issues so far and I would like to have your advise on how
    to deal with them.

    The languages use the Latin script, thanks to Christian missionaries.

    1. instead of the letter 'g', they use the letter 'nġ'.
    This is a separate letter and not a ligature. It gets sorted differently
    in Amis and Paiwan languages and when type processing, it needs to be
    handled as such.

    My idea would be to encode this letter as a seperate character, as it
    has its own semantic. We can put it probably into one of the existing
    Latin Extensions in Unicode.

    2. With the character 'nġ': in Amis this character, like all others, can
    get an acute, grave or circumflex accent. While we can use combining
    accent sequences to produce such characters, for the 'nġ' the dot on the
    g needs to be replaced, similar like it does on the 'i' in European
    languages.

    I suppose we need to encode a letter 'dotless ng' for this, like we have
    with the 'i'.

    3. In Amis language the 'i' when it gets its acute, grave or circumflex
    accent, it keeps the i-dot in place and the accent gets stacked on top
    of the i-dot.
    However, fonts handling European scripts will probably take the i-dot
    away and replace it with the accent, rather than stacking the accent on
    top of it.
    Do we need to have a separate encoded 'i' for this different semantic
    purpose? Or is there a better way to solve this issue?

    I don't really want to publish separate Latin fonts just for the Taiwan
    Aboriginal Languages, but rather ask font maintainers to include support
    for the currently unsupported accent combinations. That way we can have
    more font styles supporting the script.

    Any opinions about these issues?

    Cheers
    Arne
    - --
    Arne Götje (高盛華) <arne@linux.org.tw>
    PGP/GnuPG key: 1024D/685D1E8C
    Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
    Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    iD8DBQFFkJY8bp/QbmhdHowRAuIqAKCFIW3oU9e+hRqFrszsNn/QYBBInACaAjTj
    g8PuB1UYjmR26ykIsi/5uIE=
    =gfR6
    -----END PGP SIGNATURE-----



    This archive was generated by hypermail 2.1.5 : Mon Dec 25 2006 - 21:30:03 CST