Restriction to 10FFFF

UTC/2000-006

Re: Proposal to restrict the range of code positions to the values up to U-0010FFFF

Date: 2000-01-13

From: Mark Davis

Draft for Discussion at UTC meeting

ISO
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC 1/SC 2/WG 2

Universal Multiple-Octet Coded Character Set
(U C S)

ISO/IEC JTC1/SC2/WG2 N----
Date: 2000-01-13

Title:

Proposal to restrict the range of code positions to the values up to U-0010FFFF

Source:

Unicode Technical Committee

Status:

Liaison

Action:

For consideration by JTC1/SC2/WG2

The design of UTF-16 permits addressing up to 10FFFF₁₆ characters, which represents over 1,000,000 code positions. It has become clear that this range of code positions is sufficient for all foreseeable character allocations. Yet the difference in representation between UTF-16 and the other encoding forms UTF-8 and UCS-4 causes continued confusion among developers and users, and the unnecessary appearance of a possible schism between Unicode and 10646.

This situation presents unnecessary interoperability problems for implementers. If this situation continues, there is little recourse but to define special versions of UTF-8 and UCS-4 that are restricted to the same domain as UTF-16for interoperability with UTF-16. (See UTR #19: http://www.unicode.org/unicode/reports/tr19/.)

This proposal is to remedy the situation by amending 10646 to exclude values above U-0010FFFF, much as values above U-7FFFFFFF are currently excluded. The private use characters from U-60000000 to U-7F000000 will be denigrated; luckily there are no significant implementations using those characters.

More formally:

In Section 7, paragraph 1 the values of G-octets are restricted to being precisely zero, and the values of P-octets restricted to the values from 00 to 10₁₆.
In Section 10.2, the statement reserving the code positions of the 32 groups from Group 60 to Group 7F for private use is withdrawn, replaced by a statement that the use of code positions of Group 60 through Group 7F is denigrated.
In Annex D (UTF-8) appropriate deletions are made to limit the format to 4 bytes, with a few additional edits to correct byte ranges.

Title:	Proposal to restrict the range of code positions to the values up to U-0010FFFF
Source:	Unicode Technical Committee
Status:	Liaison
Action:	For consideration by JTC1/SC2/WG2

Draft for Discussion at UTC meeting

ISO INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION

Title:

Proposal to restrict the range of code positions to the values up to U-0010FFFF

Source:

Unicode Technical Committee

Status:

Liaison

Action:

For consideration by JTC1/SC2/WG2

ISO
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION