Mon Jul 31 2000

Angelo Dalli wrote:

> These characters are the only two digraphs in the Maltese language, namely
> the consonant 'gh' (where h is stroked) and the sixth vowel 'ie'. Though
> these characters can be rendered onscreen using separate characters, they
> are actually defined as separate characters in the Maltese alphabet. There
> are also problems in distinguishing between the Maltese 'ie' and the
> sequence 'i' + 'e' found in words adopted from English. Evidently, the only
> correct solution to this problem is to add these two characters to Unicode.
> The characters are in heavy daily use, making Unicode quite inadequate to
> represent Maltese until they are added.

Angelo, I think you need to make sure that you are clear about the use of
the term 'character' here, and understand that the Unicode definition of a
character is not the same as you imply when you speak of these digraphs as
being 'separate characters in the Maltese alphabet'. For the sake of
clarity, I'll refer to the latter as 'graphemes'; that is, graphic
representations of particular Maltese sounds treated as distinct elements
of the alphabet. Now, it is not in any way essential for a grapheme to be
encoded as a single Unicode character; for many languages supported by
Unicode, digraphs, trigraphs and even longer sequences treated as separate
graphemes in the alphabets of those languages are encoded as sequences of
two or more Unicode characters. The key to this, of course, is that
software needs to know, e.g. for sorting purposes, that the character
sequence ie must be handled in a particular way for Maltese that is
different from how it will be handled for other languages.

As you'll have noted, there are a small number of digraph characters in
the Unicode standard -- the Dutch IJ,ij and the Croatian digraphs --, but
these are only included for easy backwards compatibility with some
existing standards and, in the case of the Croatian characters, to
facilitate transliteration with the Serbian Cyrillic orthography. The
Dutch IJ,ij digraph is a good parallel to the Maltese graphemes, because
it too is separately sorted and has particular capitalisation rules
associated with it; at the same time, although the IJ,ij digraph is
included in Unicode for backwards compatibility with some older Dutch
standards (telecommunications standards? I forget), almost no Dutch text
makes use of the digraph characters. My Dutch colleagues use the I
character followed by the J character, and rely on properly implemented
Dutch sorting rules. I think you will find that the same can be done with
Maltese, and without the requirement of backwards compatibility with an
existing national standard, you will find it very hard to convince the
Unicode Technical Committee of the need to encode these digraphs as
separate characters.

Malta is one of my favourite places in the world and, believe me, I would
have long ago proposed addition of these digraphs myself if I thought they
needed to be included in Unicode.

John Hudson

Tiro Typeworks
Vancouver, BC

