Re: Nice to join this forum....

From: African Oracle (
Date: Mon May 03 2004 - 11:41:15 CDT

Thanks Doug. all contributions are appreciated.



----- Original Message -----
From: "Doug Ewell" <>
To: "Unicode Mailing List" <>
Cc: "African Oracle" <>; "Michael Everson"
Sent: Monday, May 03, 2004 5:11 PM
Subject: Re: Nice to join this forum....

> Dele a.k.a. "African Oracle" <oracle at africaservice dot com> wrote:
> > GB is a different from G+B You do not pronunce the letters separately
> > but people that do not know anything about the language do which is
> > wrong. It is about correction and proper representation.
> What Michael and others have been trying to say is this:
> Unicode encodes characters, not languages. The word "character" means
> different things to ordinary people, depending on what language they
> speak and what script they write. "Characters" in Unicode do not always
> correspond 1-to-1 with "letters" in a given language's alphabet.
> Here are some quick and dirty definitions for our purposes:
> Character: the basic unit of text encoding.
> Letter: the basic unit of a language's orthography. Not necessarily the
> same as "character."
> Glyph: the visual representation of a character. Also not necessarily
> the same as "character."
> In Spanish, the combination "ch" is considered a distinct letter of the
> alphabet. It has its own name, "che." Children learn it as a letter
> that comes between "c" and "d". This is all good, but when it comes to
> representing text in computers, there is no separate "ch" letter in any
> of the encodings that people have used for decades. Spanish text
> includes the two characters "c" and "h". This has been true for
> decades, and it is also true when using Unicode.
> Likewise, in Yoruba, if there is no visual distinction between (1) the
> letter "GB" and (2) the two letters "G" and "B" that happen to appear
> together, as in your example, then the letter "GB" is encoded with the
> two characters "G" and "B". This does not deny the existence of a
> letter "GB" in the Yoruba language, it just dictates how that letter is
> encoded in computerized text.
> Now if you need to perform some other type of text processing, such as
> searching or sorting or spell-checking or line-breaking, then your
> software may need to understand the difference between the letter "GB"
> and the two letters "G" + "B". But this needs to be handled by the
> software, not the character encoding mechanism.
> > Here are few Yoruba alphabets which might not be new to you, so how
> > can you equate G+B with GB even if you claimed it has significant. How
> > significant is significant?
> >
> > A B D E E F G GB....
> Actually there are quite a few people on this list who are familiar with
> the letters of the Yoruba alphabet, and they are also familiar with the
> encoding principles of Unicode. That is why they are saying, yes, we
> know "GB" is a letter in Yoruba, but it is encoded as U+0047 "G" +
> U+0042 "B".
> -Doug Ewell
> Fullerton, California

This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT