> > We do need to clean up terminology, and we need to do so in a way that
> > incorporates understanding of UTR-17. I think we need:
> > - BMP characters: characters in the BMP; note that d800-dfff are not
> > characters; fffe and ffff are also not characters
> > - "astral"/supplementary/extended-plane/?? characters: everything in planes
> > 1 - 16 (excluding anything ending in fffe and ffff)
This is part of a discussion of terminology regarding surrogates
that has been ongoing among an ad hoc group working on the proposed
UTR on surrogate handling, and a separate but related discussion
among the editorial committee. Now it seems to have migrated out
to the general list.
> I can't stand "astral planes". The term suggests, to me at
> least, that these planes (and, hence, the characters in them)
> are not as "real" as the BMP.
> By contrast, "supplementary planes" is a factual description.
I'll repeat some of the consensus that seems to have emerged from
the other smaller list discussions.
1. The terminology used by 10646 and by the Unicode Standard should
be convergent in this area, to minimize the proliferation of
confusion. The FCD for 10646-2 already uses the term "supplementary
planes", and this seems perfectly acceptable for the Unicode
Standard as well.
plane: A subdivision of a group; of 256 x 256 cells.
Suggested Unicode definitions that could be added to the Unicode
glossary, to cover this convergence:
plane: A subdivision of the encoding space; 64K code points starting
on an even 64K boundary. (Plane 0 0x0000..0xFFFF; Plane 1 0x10000..
BMP: Basic Multilingual Plane, a synonym for Plane 0.
SMP: Supplementary Multilingual Plane, a synonym for Plane 1.
The Supplementary Planes: The collective term for Planes
1 through 16, considered as a group.
The Astral Planes: Jocular synonym for the Supplementary Planes.
2. The plane names in the FCD for 10646-2 should be modified just
slightly to tie together the terminology better. The best
suggestion to date is:
>Plane 1: Supplementary Multilingual Plane for scripts and symbols (SMP)
>Plane 2: Supplementary Ideographic Plane (SIP)
>Plane 14: Supplementary Special-purpose Plane (SSP)
This makes consistent use of "supplementary plane", and ties the
plane names and acronyms together in a way which can actually be
remembered without having to look up the TLA's.
3. The term "surrogate character" should be eschewed altogether, because
of the confusion is causes. "Surrogate code point" can continue to
be used as it currently is, and the term "surrogate pair" is also
useful. But the other terminology related to characters should be
coordinated with establishing "supplementary planes" as the way to
refer to Planes 1-16. Some text I wrote earlier about this topic,
in response to a suggestion to use the terms "extended character"
and "basic character":
I don't like "extended character", because of the cognitive dissonance
regarding whether the character is an ordinary character that extends
the set located elsewhere, or whether the character itself is extended
in some way -- that is bound to cause confusions, since the UTF-16
encoding scheme for these "extended characters" extends the encoding
form to 2 wydes, as well as extending the character set by adding
Because of that, I think "supplementary character" is a far better choice
for talking about characters on Planes 1-16. There can be no confusion
there with the mechanics of the encoding form, and there is no artificial
discrimination in that term regarding the status of the good characters
we like in the Supplementary Planes versus the bad characters we don't like
in the Supplementary Planes -- just as for characters in the BMP.
And I would prefer not to start talking about characters in the BMP
as "basic characters", since, as we know, there are many thousands of
them that aren't particularly basic (or useful for implementation).
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT