Re: compatibility between unicode 2.0 and 3.0

From: Asmus Freytag (
Date: Tue Feb 04 2003 - 05:46:00 EST

  • Next message: James H. Cloos Jr.: "Re: Public Review Issues update"

    At 10:49 PM 2/3/03 -0800, Doug Ewell wrote:
    > > Can you please explain what is the best practice to handle unassigned
    > > code points so that applications can easily become forward compatible?
    > > If we just ignore unassigned code points, then will it make for
    > > application easier to migrate to later version of Unicode?

    In many circumstances, the best approach for unassigned character
    codes is to treat them like the characters around them.

    An implementation might chose to interpolate the property values
    of assigned characters bordering a range of unassigned characters,
    using the following rules:

    * Look at the nearest assigned characters in both directions.
    If they are in the same block, and have the same property value,
    then use that value.
    * From any block boundary, extending to the nearest assigned
    character inside the block, use the property value of that character.
    * For all code points entirely in empty or unassigned blocks use the
    default property value for that property as given in the Unicode Character

    There are two important benefits of using that approach in implementations.
    Property values become much more contiguous, allowing better compaction of
    property tables. Furthermore, because similar characters are often
    encoded in proximity, chances are good that the interpolated values
    will match the actual property values when characters are assigned
    to a given code point later.

    Of course, many important properties may well not be predictable, but on
    the whole, the approach has proven successful.


    This archive was generated by hypermail 2.1.5 : Tue Feb 04 2003 - 06:11:05 EST