Re: Addition of remaining two Maltese Characters to Unicode

From: Doug Ewell (dewell@compuserve.com)
Date: Tue Aug 01 2000 - 10:13:41 EDT


Angelo Dalli <adall@bms.com.mt> wrote:

> Regarding 'ie' however, there must be some way to distinguish 'i' +
> 'e' from 'ie'. This is important for two main reasons,
>
> 1. Sorting problems - 'i' should come before 'ie'...

It is neither necessary nor sufficient to have a separate code point
for digraphs like IE and GH in order to solve sorting problems. An
application that expects to sort Unicode text correctly must apply some
knowledge about the sorting rules of the language.

In a discussion several months ago it was generally agreed that even
though the digraph CH serves as a single letter in the Spanish and
Slovak languages, and needs to be sorted as such, it can and should
still be represented by the two Unicode characters U+0041 and U+0048
(or case equivalents). The same is true for Maltese IE and GH.

In Angelo's specific example:

> friend
> frigate
> id-dar
> iehor
> liema

these words are sorted correctly with the existing mechanism of using
separate characters for 'i' and 'e' and not having an 'ie' character.

> 2. Means of distinguishing Maltese words from borrowed foreign words
> - without an 'ie' representation, for example, the English word
> "friends" would not be distinguishable automatically from the Maltese
> word "ktieb".

The classic example of this is the word "chat," which could be English
for "informal talk" or French for "cat." Character encoding is, once
again, neither necessary nor sufficient to determine the language of
a word.

Other than the fact that "friends" and "ktieb" are not cognates --
any speaker of English and Maltese would recognize immediately and
correctly which word belonged to which language -- what benefits would
arise from having two different 'ie' representations for the two words?
The digraph would still be typed using the I and E keys; how is the
software supposed to know to use the two characters 'i' and 'e' for
"friends" and to use the special 'ie' digraph for "ktieb"? Besides
that, it would make searching more complicated.

Maltese-aware software, like any properly internationalized and
localized software, not only can but should handle 99.9% of all issues
like this with the existing letters encoded in Unicode.

Peter Constable <Peter_Constable@sil.org> wrote:

> Angelo mentioned a need to distinguish "ie" in Maltese from "ie" in
> English borrowings, but didn't mention much about differences in
> behaviours - he only mentioned casing as a particular problem (though
> I don't see how the casing works any differently for a single grapheme
> <ie> than it does for a grapheme sequence <i><e>).

Grapheme sequences like IE have three possible cases: uppercase (IE),
lowercase (ie), and titlecase (Ie), which would occur at the beginning
of a sentence.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT