Re: Amerindian Characters

From: Asmus Freytag (
Date: Wed Jun 16 1999 - 18:27:45 EDT

In the light of Unicode Technical Report #15 (Normalization) the major
rationale for -additional- precomposed letters has gone away.

Here is why: UTR#15 describes a method to 'normalize' character data, for
all cases where there are possible multiple spellings. The main reason that
this is important is for use in Web protocols, esp. in URLs.

If all data can be required to be normalized, name lookup (URL resolution)
can be much simpler - this is especially important for small, embedded
[Any device that does not support text entry, will not need to support the
normalization tables as long as all its inputs are normalized].

However, since normalization needs to work well with legacy encodings on
the web, the normalized form for the web will be (largely) precomposed. At
the same time, it is necessary to keep the normalized form stable in the
face of character additions. Therefore, no characters added *after* Unicode
Version 3.0 will be precomposed in the normalized form (older systems
would lack the knowledge of how to relate the new character to the existing

As a result of all of this, any precomposed character added from now on to
Unicode would have to be decomposed whenever sent on the web in normalized
form. Therefore there is much less benefit of having a precomposed character
added from now on - it would have to spend much of its life in decomposed form

UTR#15 represents the line where legacy support ends and what John Cowan
called the "True Unicode Way" picks up. It's not about first or
second-class status for characters but about creating a scheme ("universal
early normalization") that sets additional ground rules for interoperating
from now into the future.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT