Re: Surrogate space in Unicode

From: DougEwell2@cs.com
Date: Fri Feb 16 2001 - 02:55:37 EST


In a message dated 2001-02-15 23:15:23 Pacific Standard Time, lord@emf.net
writes:

> It has proven difficult to come up with convenient terms for
> the Unicode characters encoded at U+10000 and beyond.
> [....]
> 2. A 'basic' code point, which may represent a 'basic
> character', can range from U+0000 through U+FFFF.
>
> For what purpose is such a distinction needed?

It is needed because of UTF-16, which requires two 16-bit code points to
represent a character with a value of U+10000 or higher (a supplementary
character) but only one 16-bit code point to represent a basic character.

Many descriptions on the Web erroneously claim that Unicode contains only the
first 64K characters of ISO 10646. Even the Unicode Standard Version 3.0
states, "Plain Unicode text consists of sequences of 16-bit character codes."
 To me this sentence is very misleading and requires that special attention
be paid to the nature of supplementary characters, those to be assigned in
Unicode 3.1 and those to be assigned in future versions.

Because of the widespread belief that Unicode stops at U+FFFF, many fonts and
applications that claim to support Unicode can only handle basic characters,
not supplementary characters.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT