RE: Chapter on character sets

From: Doug Ewell (dewell@compuserve.com)
Date: Fri Jun 16 2000 - 02:05:13 EDT


Kenneth Whistler <kenw@sybase.com> wrote:

> However, as we roll out Unicode 3.1, containing many new characters for
> Plane 1, Plane 2, (and those pesky tag characters on Plane 14), you
> will see that the defining data files *will* be referring to characters
> by their scalar values, rather than by the surrogate pairs required for
> representing them in the UTF-16 encoding form.

Two questions:

1. What is the projected timetable for the first version of Unicode that
    contains character assignments beyond Plane 0? I'm just wondering,
    not trying to seem impatient. (Really.)

2. How will UnicodeData.txt in particular be modified to represent the
    scalar values of characters beyond Plane 0? Will the first column
    use 5, 6, or 8 hex digits? Will the scalar values of Plane 0
    characters continue to use only 4 hex digits? What compatibility
    problems might be introduced?

> You can see this trend already on the list when people are discussing
> characters under ballot for 10646-2. They are referred to by their
> scalar values, and not by surrogate pairs, except when something about
> the UTF-16 encoding form is what is at issue.

Maybe this refers to the unicore list, because (regrettably) I haven't
seen any discussion on this list of proposed characters beyond Plane 0
where actual code points are specified.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT