UTC/2000-006

Re: Proposal to restrict the range of code positions to the values up to U-0010FFFF

Date: 2000-01-13

From: Mark Davis

Draft for Discussion at UTC meeting


ISO
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC 1/SC 2/WG 2

Universal Multiple-Octet Coded Character Set
(U C S)

ISO/IEC JTC1/SC2/WG2 N----
Date: 2000-01-13

 

Title: 

Proposal to restrict the range of code positions to the values up to U-0010FFFF

Source: 

Unicode Technical Committee

Status: 

Liaison

Action: 

For consideration by JTC1/SC2/WG2

The design of UTF-16 permits addressing up to 10FFFF16 characters, which represents over 1,000,000 code positions. It has become clear that this range of code positions is sufficient for all foreseeable character allocations. Yet the difference in representation between UTF-16 and the other encoding forms UTF-8 and UCS-4 causes continued confusion among developers and users, and the unnecessary appearance of a possible schism between Unicode and 10646.

This situation presents unnecessary interoperability problems for implementers. If this situation continues, there is little recourse but to define special versions of UTF-8 and UCS-4 that are restricted to the same domain as UTF-16for interoperability with UTF-16.  (See UTR #19: http://www.unicode.org/unicode/reports/tr19/.)

This proposal is to remedy the situation by amending 10646 to exclude values above U-0010FFFF, much as values above U-7FFFFFFF are currently excluded. The private use characters from U-60000000 to U-7F000000 will be denigrated; luckily there are no significant implementations using those characters.

More formally: