Re: Just if and where is the then?

From: African Oracle (oracle@africaservice.com)
Date: Tue May 04 2004 - 16:32:08 CDT


Thanks for your wonderful response. I just used the two letters that I
mentioned as an example and as many on this forum might know already that
the African letter e with dot below is a distinguished letter on its own
like the o with dot below or s with dot below. The dot below has been an
issue especially in font design or development and I think, till today it is
only my company that has addressed that issue whereby an underline does not
overlay prevent the dot from showing. Please see
http://www.dnetcom.com/fonts/ariyaimage2.html

Instead of choosing e as the character base, e with dot below and o with dot
below can be used as you rightly suggested, then the accents can be composed
with any of the two.

Dele Olawole

----- Original Message -----
From: "Philippe Verdy" <verdy_p@wanadoo.fr>
To: "African Oracle" <oracle@africaservice.com>
Cc: "Unicode List" <unicode@unicode.org>
Sent: Tuesday, May 04, 2004 11:14 PM
Subject: Re: Just if and where is the then?

> From: "African Oracle" <oracle@africaservice.com>
> > If a can have U+0061 and have a composite that is U+00e2...U+...
> > If e can have U+0065 and have a composite that is U+00ea...U+...
> >
> > Then why is e with accented grave or acute and dot below cannot be
assigned
> > a single unicode value instead of the combinational values 1EB9 0301 and
> > etc....
> >
> > Since UNICODE is gradually becoming a defacto, I still think it will not
be
> > a bad idea to have such composite values.
>
> I think that the response is that decompositions come from the need to
support
> roundtrips with legacy preexisting standards. This justifies the need to
offer
> canonical equivalences and normalizations.
>
> Outside this, I don't think there's a preexisting African standard with
which
> such canonical equivalence is needed. In fact the existence of multiple
ways to
> encode the same characters is a pollution, but something needed to make
Unicode
> work and interoperate with widely used previous legacy standards.
>
> Finally, there has been a contractual agreement between Unicode, ISO/IEC
10646
> and other standard bodies, to keep a "stability policy" for
normalizations. Due
> to this policy, it's impossible now to define a canonical equivalence
between a
> newly-encoded precombined character and a sequence composed of preexisting
base
> letters and diacritics.
>
> So this mean that the only way to include e-with-acute-and-dot-below would
be to
> include it as a new distinct code point, WITHOUT any canonical
equivalence. This
> is not really a problem as long as the African languages needing this
character
> will adopt a consistant representation. But you will see immediately that
it
> will become impossible to define a standard canonical equivalence between
> characters entered in decomposed forms and newer characters entered as a
single
> precombined code point. For Unicode, ISO/IEC 10646, and for all other
standards
> which depend on Unicode and which have signed the policy agreement, these
> sequences will be considered distinct, for ever.
>
> This won't be a problem if a new African standard is decided that decides
to use
> a single precombined code point (this standard should then really indicate
that
> the character is NOT decomposable).
>
> The other way to create a new decomposable character would be to define
> decompositions containing at least one NEW codepoint. I doubt this would
be
> desired for the base letter e, or even for the acute accent. But it may be
> possible for the dot below.
>
> One thing will mitigate this last approach: with how many base letters
(possibly
> precombined) must we define a composition with such new African dot below
> character? Is the repertoire of letters with dot below completely closed
> (including base letters with other diacritics)? As soon as such new
African dot
> below would be defined, all the possible preexisting letters would have to
be
> included in a decomposition pair. It seems difficult to achieve this goal
with a
> repertoire of African letters which is currently not bounded. (In the past
it
> was not a problem, but Unicode stability policies will not make this
repertoire
> extensible later once such African dot below diacritic would be introduced
in
> some version).
>
> So the simplest approach is to not define anything, and enter these
African
> letters in their decomposed form (with the exception of letters with
overlaying
> or ligaturing diacritics, which should be encoded separately, without
> decompositions).
>
> Remember this: decompositions of Unicode characters is a pollution needed
only
> for supporting legacy standards and make them interoperable with or
through
> Unicode.
>
> This Unicode policy won't prevent the possible definition of a smaller
African
> subset with its own charset encoding where these letters are represented
in
> their precomposed form only; it will also be possible to define such
possible
> future standard (if there's a legitimate need for it) with a complete
roundtrip
> compatibility with Unicode decomposed characters.
>
> In summary, for African letters: there's no need (and it's in fact
impossible
> now) to encode in Unicode new letters with dots below unless the base
letter is
> also absent from Unicode. But barred letters are good candidates for
inclusion
> as isolated (not decomposable) code points.
>
>



This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT