Re: Just if and where is the then?

From: African Oracle (
Date: Tue May 04 2004 - 16:32:08 CDT

Thanks for your wonderful response. I just used the two letters that I
mentioned as an example and as many on this forum might know already that
the African letter e with dot below is a distinguished letter on its own
like the o with dot below or s with dot below. The dot below has been an
issue especially in font design or development and I think, till today it is
only my company that has addressed that issue whereby an underline does not
overlay prevent the dot from showing. Please see

Instead of choosing e as the character base, e with dot below and o with dot
below can be used as you rightly suggested, then the accents can be composed
with any of the two.

Dele Olawole

----- Original Message -----
From: "Philippe Verdy" <>
To: "African Oracle" <>
Cc: "Unicode List" <>
Sent: Tuesday, May 04, 2004 11:14 PM
Subject: Re: Just if and where is the then?

> From: "African Oracle" <>
> > If a can have U+0061 and have a composite that is U+00e2...U+...
> > If e can have U+0065 and have a composite that is U+00ea...U+...
> >
> > Then why is e with accented grave or acute and dot below cannot be
> > a single unicode value instead of the combinational values 1EB9 0301 and
> > etc....
> >
> > Since UNICODE is gradually becoming a defacto, I still think it will not
> > a bad idea to have such composite values.
> I think that the response is that decompositions come from the need to
> roundtrips with legacy preexisting standards. This justifies the need to
> canonical equivalences and normalizations.
> Outside this, I don't think there's a preexisting African standard with
> such canonical equivalence is needed. In fact the existence of multiple
ways to
> encode the same characters is a pollution, but something needed to make
> work and interoperate with widely used previous legacy standards.
> Finally, there has been a contractual agreement between Unicode, ISO/IEC
> and other standard bodies, to keep a "stability policy" for
normalizations. Due
> to this policy, it's impossible now to define a canonical equivalence
between a
> newly-encoded precombined character and a sequence composed of preexisting
> letters and diacritics.
> So this mean that the only way to include e-with-acute-and-dot-below would
be to
> include it as a new distinct code point, WITHOUT any canonical
equivalence. This
> is not really a problem as long as the African languages needing this
> will adopt a consistant representation. But you will see immediately that
> will become impossible to define a standard canonical equivalence between
> characters entered in decomposed forms and newer characters entered as a
> precombined code point. For Unicode, ISO/IEC 10646, and for all other
> which depend on Unicode and which have signed the policy agreement, these
> sequences will be considered distinct, for ever.
> This won't be a problem if a new African standard is decided that decides
to use
> a single precombined code point (this standard should then really indicate
> the character is NOT decomposable).
> The other way to create a new decomposable character would be to define
> decompositions containing at least one NEW codepoint. I doubt this would
> desired for the base letter e, or even for the acute accent. But it may be
> possible for the dot below.
> One thing will mitigate this last approach: with how many base letters
> precombined) must we define a composition with such new African dot below
> character? Is the repertoire of letters with dot below completely closed
> (including base letters with other diacritics)? As soon as such new
African dot
> below would be defined, all the possible preexisting letters would have to
> included in a decomposition pair. It seems difficult to achieve this goal
with a
> repertoire of African letters which is currently not bounded. (In the past
> was not a problem, but Unicode stability policies will not make this
> extensible later once such African dot below diacritic would be introduced
> some version).
> So the simplest approach is to not define anything, and enter these
> letters in their decomposed form (with the exception of letters with
> or ligaturing diacritics, which should be encoded separately, without
> decompositions).
> Remember this: decompositions of Unicode characters is a pollution needed
> for supporting legacy standards and make them interoperable with or
> Unicode.
> This Unicode policy won't prevent the possible definition of a smaller
> subset with its own charset encoding where these letters are represented
> their precomposed form only; it will also be possible to define such
> future standard (if there's a legitimate need for it) with a complete
> compatibility with Unicode decomposed characters.
> In summary, for African letters: there's no need (and it's in fact
> now) to encode in Unicode new letters with dots below unless the base
letter is
> also absent from Unicode. But barred letters are good candidates for
> as isolated (not decomposable) code points.

This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT