Re: Nice to join this forum....

From: Doug Ewell (dewell@adelphia.net)
Date: Mon May 03 2004 - 10:11:50 CDT


Dele a.k.a. "African Oracle" <oracle at africaservice dot com> wrote:

> GB is a different from G+B You do not pronunce the letters separately
> but people that do not know anything about the language do which is
> wrong. It is about correction and proper representation.

What Michael and others have been trying to say is this:

Unicode encodes characters, not languages. The word "character" means
different things to ordinary people, depending on what language they
speak and what script they write. "Characters" in Unicode do not always
correspond 1-to-1 with "letters" in a given language's alphabet.

Here are some quick and dirty definitions for our purposes:

Character: the basic unit of text encoding.
Letter: the basic unit of a language's orthography. Not necessarily the
same as "character."
Glyph: the visual representation of a character. Also not necessarily
the same as "character."

In Spanish, the combination "ch" is considered a distinct letter of the
alphabet. It has its own name, "che." Children learn it as a letter
that comes between "c" and "d". This is all good, but when it comes to
representing text in computers, there is no separate "ch" letter in any
of the encodings that people have used for decades. Spanish text
includes the two characters "c" and "h". This has been true for
decades, and it is also true when using Unicode.

Likewise, in Yoruba, if there is no visual distinction between (1) the
letter "GB" and (2) the two letters "G" and "B" that happen to appear
together, as in your example, then the letter "GB" is encoded with the
two characters "G" and "B". This does not deny the existence of a
letter "GB" in the Yoruba language, it just dictates how that letter is
encoded in computerized text.

Now if you need to perform some other type of text processing, such as
searching or sorting or spell-checking or line-breaking, then your
software may need to understand the difference between the letter "GB"
and the two letters "G" + "B". But this needs to be handled by the
software, not the character encoding mechanism.

> Here are few Yoruba alphabets which might not be new to you, so how
> can you equate G+B with GB even if you claimed it has significant. How
> significant is significant?
>
> A B D E E F G GB....

Actually there are quite a few people on this list who are familiar with
the letters of the Yoruba alphabet, and they are also familiar with the
encoding principles of Unicode. That is why they are saying, yes, we
know "GB" is a letter in Yoruba, but it is encoded as U+0047 "G" +
U+0042 "B".

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/



This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT