L2/99-201

Re: Comments on the working draft of ISO/IEC 10646 Part 2 (WG2 N 2012R2)

From: US

Date: 1999-06-11

The US requests that the following language be added to Part 2, end of clause 8 (General Purpose Plane), for compatibility with the Unicode Standard:

"To allow a greater degree of compatibility across versions of the standard, the ranges U-000E0000..U-000E1000 are reserved for future alternative format characters."

The US suggests that the following note be added after this clause.

Unassigned code points in these ranges should be ignored in normal processing and display.

The US requests that the following language be added to Part 2, as an amendment to Part 1, clause 8 (Basic Multilingual Plane):

To allow a greater degree of compatibility across versions of the standard, the ranges U-00002060..U-00002069 are reserved for future format characters.

The US suggests that the following note be added after this clause.

Unassigned code points in these ranges should be ignored in normal processing and display.

___________________________________________________________________________

For information, the following text is being published in the Unicode Standard, Version 3.0.

Unassigned Characters

In practice, applications must deal with unassigned code points. This may occur, for example, when the application is handling text that originated on a system implementing a later release of Unicode with additional assigned characters. To work properly in implementations, unassigned code points must be given default properties as if they were characters, since various algorithms require properties to be assigned to every character in order to function at all. These properties are not uniform across all unassigned code points, since certain ranges of code points need different properties to maximize compatibility.

The Unicode Bidirectional Algorithm assigns directional properties based on the expected direction of characters to be added in the future. All unassigned code points in Hebrew, Arabic, Thaana, and Syriac blocks are given the bidirectional property R (right-to-left). These are the ranges 0590-05FF, FB1D-FB4F, 0600-07BF, FB50-FDFF, and FE70-FEFF. All other unassigned code points are given the bidirectional property L (left-to-right).

Normally, code points outside the repertoire of supported characters would be displayed with a fall-back glyph, such as a black box. However, format and control characters must not have visible glyphs (although they may have an effect on other characters in display). These characters are also ignored except with respect to specific, defined processes: for example, ZERO WIDTH NON-JOINER is ignored in collation. To allow a greater degree of compatibility across versions of the standard, the ranges 2060-2069 and 000E0000-000E1000 are reserved for future format and control characters. Unassigned code points in these ranges should be ignored in processing and display.