Re: A basic question on encoding Latin characters

From: Andrew Cunningham (andjc@ozemail.com.au)
Date: Sat Sep 25 1999 - 11:27:35 EDT


Hi

An interesting discussion of pre-composed characters vs composed characters

esp. with relation to African languages ...

currently I'm doing some work with the Dinka language, assisting in the
preparation of new textbooks for children.

As far as I can tell there are no existent character sets that support the
language. The orthography is based on the Latin alphabet with a couple of
additional characters .. all bar two of the capital letters appear in
unicode

To complicate matters ... and bringing the my discussion back to precomposed
vs composed characters ...

All of the vowels bar one have a "breathy" form which is represented by a
dieresis

Two of the vowels are glyphs similar to U+0254 and U+025B... so currently
no chance of being able to render them with current software or fonts ...

If I remember correctly the following unicode characters correspond to
characters in the Dinka language.

U+0254 LATIN SMALL LETTER OPEN 0

There is an uppercase version of this character ... U+0186 .. but from what
I understand the Dinka character is subtly different

U+025B LATIN SMALL LETTER OPEN E

U+0190 LATIN CAPITAL LETTER OPEN E

The other two characters that don't seem to have glyphs corresponding to the
capital version of the character are :

U+014B LATIN SMALL LETTER ENG

U+0263 LATIN SMALL LETTER GAMMA

the six "breathy vowels" are represented by the vowel with dieresis

The vowels in question are : a, e, I, o, U+0254, U+025B

The vowel 'u' does not have a breathy form.

All of which probably means that we'll have to resort to the undesirable
solution of creating a custom character set and font ...

Cha`o

Andj

Andrew Cunningham
Information Systems Librarian
Maribyrnong Library Services

andjc@ozemail.com.au

----- Original Message -----
From: Michael Everson <everson@indigo.ie>
To: Unicode List <unicode@unicode.org>
Cc: Unicode List <unicode@unicode.org>
Sent: Saturday, 25 September 1999 21:13
Subject: RE: A basic question on encoding Latin characters

Ar 11:31 -0700 1999-09-24, scríobh John Hudson:

>Other problems stem from over
>enthusiastic application of the abstract character philosophy by the UTC,
>resulting in awkward codepoint unifications which make the task of mapping
>glyphs to character encodings in fonts unnecessarily complicated.

We successfully disunified YOGH from EZH and S and T WITH COMMA BELOW from
S and T WITH CEDILLA.

>This
>accounts for, among many other instances, the lowercase hooked f used in
>the orthographies of many African languages being unified with the
>florin/guilder currency symbol, which is semantically distinct, most often
>fitted to the figure width, and almost always stylistically represented in
>an oblique or script form.

And therefore most fonts are useless for the African languages. To my mind
the florin sign does not appear in the UCS.

>The job of the font developer might be described as 'solving glyph problems
>caused by character sets'.

No, we should fix the character sets.

The following disunifications are, in general, still under discussion:

* COPTIC to be disunified from GREEK
* CYRILLIC LETTERs KU, WE, to be disunified from LATIN LETTERs Q, W
* FLORIN SIGN to be disunified from LATIN LETTER F WITH HOOK
* LATIN LETTER TONE THREE to be disunified from CYRILLIC LETTER ZE
* LATIN LETTER TONE FOUR to be disunified from CYRILLIC LETTER CHE
* CYRILLIC LETTER EN WITH TAIL to be disunified from CYRILLIC LETTER EN
WITH DESCENDER
* IPA Greek letters to be disunified from Greek letters.

Of them, I suppose the FLORIN SIGN disunification might be considered
expensive to industry because Y appears in Apple and Microsoft code pages.
Pity about the Africans, in that case. Unless we help them by adding a
LATIN SMALL LETTER AFRICAN F WITH HOOK which would case map to the LATIN
CAPITAL LETTER F WITH HOOK.

--
Michael Everson * Everson Gunn Teoranta * http://www.indigo.ie/egt
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Guthán: +353 1 478 2597 ** Facsa: +353 1 478 2597 (by arrangement)
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT