VS: Taiwan Aboriginal Languages and Unicode support

From: Erkki I. Kolehmainen (eik@iki.fi)
Date: Tue Dec 26 2006 - 04:06:45 CST

  • Next message: Philippe Verdy: "Re: Taiwan Aboriginal Languages and Unicode support"


    Also in the Lithuanian language the dot above letter i is preserved when accents are attached to it. The way to encode this is to have a sequence: letter i + combining dot above + the appropriate accent. The same solution could be available for Amis; see 3.11 Canonical Ordering Behaviour, Application of Combining Marks, P9 [Guideline] on p. 113 of TUS 5.0.


    Erkki I. Kolehmainen
    Tilkankatu 12 A 3, FI-00300 Helsinki, Finland
    Puh. (09) 4368 2643, 0400 825 943; Tel. +358 9 4368 2643, +358 400 825 943

    -----Alkuperäinen viesti-----
    Lähettäjä: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] Puolesta "Arne Götje (???)"
    Lähetetty: 26. joulukuuta 2006 5:26
    Vastaanottaja: unicode@unicode.org
    Kopio: Andrew Lee; Enrico Zini
    Aihe: Taiwan Aboriginal Languages and Unicode support

    Hash: SHA1

    Hi list,

    I just returned from a trip to visit some of the local Taiwan aboriginal tribes to evaluate the alphabets they use and whether or not Unicode already has support for all of the characters or not and how to input them. In the current situation they can neither type nor display the characters correctly, which leads to some crude '^i' or '`d' and such.

    So far, we have collected the characters used in the Amis and Paiwan languages. We will visit the other tribes too to gather information from them in the near future.

    The full character lists are here:
     * Amis: http://www.enricozini.org/2006/amis-character-list.html
     * Paiwan: http://www.enricozini.org/2006/paiwan-character-list.html

    We found two issues so far and I would like to have your advise on how to deal with them.

    The languages use the Latin script, thanks to Christian missionaries.

    1. instead of the letter 'g', they use the letter 'nġ'.
    This is a separate letter and not a ligature. It gets sorted differently in Amis and Paiwan languages and when type processing, it needs to be handled as such.

    My idea would be to encode this letter as a seperate character, as it has its own semantic. We can put it probably into one of the existing Latin Extensions in Unicode.

    2. With the character 'nġ': in Amis this character, like all others, can get an acute, grave or circumflex accent. While we can use combining accent sequences to produce such characters, for the 'nġ' the dot on the g needs to be replaced, similar like it does on the 'i' in European languages.

    I suppose we need to encode a letter 'dotless ng' for this, like we have with the 'i'.

    3. In Amis language the 'i' when it gets its acute, grave or circumflex accent, it keeps the i-dot in place and the accent gets stacked on top of the i-dot. However, fonts handling European scripts will probably take the i-dot away and replace it with the accent, rather than stacking the accent on top of it. Do we need to have a separate encoded 'i' for this different semantic purpose? Or is there a better way to solve this issue?

    I don't really want to publish separate Latin fonts just for the Taiwan Aboriginal Languages, but rather ask font maintainers to include support for the currently unsupported accent combinations. That way we can have more font styles supporting the script.

    Any opinions about these issues?

    - --
    Arne Götje (高盛華) <arne@linux.org.tw>
    PGP/GnuPG key: 1024D/685D1E8C
    Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
    Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.

    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    -----END PGP SIGNATURE-----

    This archive was generated by hypermail 2.1.5 : Tue Dec 26 2006 - 04:09:54 CST