Re: Taiwan Aboriginal Languages and Unicode support

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Dec 25 2006 - 22:10:19 CST

Next message: Arne Götje (高盛華): "Re: Taiwan Aboriginal Languages and Unicode support"

Previous message: Arne Götje (高盛華): "Taiwan Aboriginal Languages and Unicode support"
In reply to: Arne Götje (高盛華): "Taiwan Aboriginal Languages and Unicode support"
Next in thread: Arne Götje (高盛華): "Re: Taiwan Aboriginal Languages and Unicode support"
Reply: Arne Götje (高盛華): "Re: Taiwan Aboriginal Languages and Unicode support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Arne Götje (高盛華) <arne at linux dot org dot tw> wrote:

> 1. instead of the letter 'g', they use the letter 'nġ'. This is a
> separate letter and not a ligature. It gets sorted differently in Amis
> and Paiwan languages and when type processing, it needs to be handled
> as such.
>
> My idea would be to encode this letter as a seperate character, as it
> has its own semantic. We can put it probably into one of the existing
> Latin Extensions in Unicode.

U+006E U+0121

or, if both n and ġ are individual letters and can appear together with
a different semantic from the one you describe, and if collating tables
are tailored to take CGJ into account:

U+006E U+034F U+0121

See the often-cited examples of "ch" in Spanish and Czech. The fact
that two existing characters combine to make a single "letter" in an
orthography does not justify encoding the combination as a separate
character. Most of the existing examples where this was done in Unicode
were to achieve some 1-to-1 convertibility goal in Unicode 1.0, and do
not represent a precedent for future encoding.

See also the WG2 "Principles and Procedures" document, Annex G (page
31):
http://www.dkuug.dk/JTC1/SC2/WG2/docs/n3002.pdf

> 2. With the character 'nġ': in Amis this character, like all others,
> can get an acute, grave or circumflex accent. While we can use
> combining accent sequences to produce such characters, for the 'nġ'
> the dot on the g needs to be replaced, similar like it does on the 'i'
> in European languages.
>
> I suppose we need to encode a letter 'dotless ng' for this, like we
> have with the 'i'.

I don't remember if there is a generic way to make a combining mark
(such as an acute accent) apply to a group of two base letters (such as
n g), but that is the way to solve this problem, not by encoding another
precomposed combination.

The analogy with dotless-i is not sound; there were numerous legacy
character sets for Turkish that distinguished dotted-i from dotless-i,
and Unicode had to maintain 1-to-1 convertibility with those character
sets. The same situation does not apply to "ng".

> 3. In Amis language the 'i' when it gets its acute, grave or
> circumflex accent, it keeps the i-dot in place and the accent gets
> stacked on top of the i-dot.
> However, fonts handling European scripts will probably take the i-dot
> away and replace it with the accent, rather than stacking the accent
> on top of it.
> Do we need to have a separate encoded 'i' for this different semantic
> purpose? Or is there a better way to solve this issue?

U+0069 U+0307 U+0301
U+0069 U+0307 U+0300
U+0069 U+0307 U+0302

This is what Lithuanian does, IIRC.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages

Next message: Arne Götje (高盛華): "Re: Taiwan Aboriginal Languages and Unicode support"
Previous message: Arne Götje (高盛華): "Taiwan Aboriginal Languages and Unicode support"
In reply to: Arne Götje (高盛華): "Taiwan Aboriginal Languages and Unicode support"
Next in thread: Arne Götje (高盛華): "Re: Taiwan Aboriginal Languages and Unicode support"
Reply: Arne Götje (高盛華): "Re: Taiwan Aboriginal Languages and Unicode support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Dec 25 2006 - 22:13:12 CST