Re: Just if and where is the then?

From: African Oracle (
Date: Tue May 04 2004 - 20:19:44 CDT

Ken I appreciate your detailed response and Peter has also provided an
insightful answer. It is a learning process and I am learning everyday.


Dele Olawole

----- Original Message -----
From: "Kenneth Whistler" <>
To: <>
Cc: <>; <>
Sent: Wednesday, May 05, 2004 2:38 AM
Subject: Re: Just if and where is the then?

> Dele,
> > "No new composite values will be added". - Peter Constable
> >
> > The above sounds dictatorial in nature.
> Peter has already explained that this is just the nature
> of the current policy regarding such additions. The reason
> for the policy others in this thread have attempted to
> explain. The short answer is that it would disturb the
> stability of the definition of normalization of data involving
> Unicode characters, and stability of normalization is
> extremely important to many implementations of the standard.
> This said, you need to understand that there is a learning
> curve for people coming new to the Unicode Standard.
> The existence of a policy which constrains certain kinds of
> additions to the standard is not a matter of dictatorial
> proclamations -- it is not something that Peter Constable or
> any other individual has the power to impose.
> Such policies arise out of the consensus deliberations of
> the Unicode Technical Committee, which involve many different
> members, jointly responsible for the technical content of
> the standard. They are also endorsed in the Principles and
> Procedures document for the ISO committee, JTC1/SC2/WG2
> responsible for the parallel, de jure international character
> encoding standard, ISO/IEC 10646. And in that committee,
> decisions are also made based on consensus after discussion
> among members of many different participating national bodies.
> As for the particular issue regarding characters like {e with
> dot below and acute accent}, for example, the policy is not
> in place as a matter of discrimination against particular
> languages or orthographies.
> The *glyph* for {e with dot below and acute accent} can and
> should be in a font for use with a language that requires
> it. Alternatively, the font and/or rendering system should be
> smart enough to be able to apply diacritics correctly.
> But the *characters* needed to represent this are already in
> the Unicode Standard, so the text in question can *already*
> be handled by the standard. Trying to introduced a single,
> precomposed character to do this, instead, would just introduce
> normalization issues into the standard without actually
> increasing its ability to represent what you need to
> represent.
> As Peter has explained, a "letter" or a "grapheme" doesn't
> necessarily have a 1-to-1 relationship to the formal,
> abstract character encoded in the Unicode Standard for use
> in representing text.
> You had one example already: "gb" is a "letter" in Edo. That
> fact is important for education, for language learning, for
> sorting, and various other things. But that "letter" is
> represented by a sequence of *characters* already encoded
> in Unicode: <0067, 0062>.
> Likewise, if you have an acute accented e with dot below, that
> may constitute a single accented "letter" in Edo, but it is
> represented by a sequence of *characters* already encoded
> in Unicode: <0065, 0323, 0301>.
> These decisions regarding the underlying numbers representing
> these elements of text are *not* required to be surfaced up
> to the level of end users. Properly operating software supporting
> a particular language should present the alphabetic units and
> their behavior to users they way *they* expect they should
> work. The fact that Unicode systems haven't gotten there in
> many cases yet is an artifact of the enormous difficulty of
> getting computers to work for *all* the writing systems and
> languages of the world. People are working hard on the
> problem, but it is a *big* problem to solve.
> --Ken

This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT