Re: Hypersurrogates: a proposed convention for ISO 10646 ->

From: Kenneth Whistler (
Date: Wed Nov 17 1999 - 15:46:50 EST

Markus responded to John Cowan's suggestion:

> someone suggested something more utf-16-centric earlier on this list for
> this purpose.
> i don't like it because it makes it ambiguous what code points in the
> private use planes mean.

It is already the case that private use code points are, well, ambiguous
as to their meaning. They can be used to represent anything, from
terminal glyph sets, to Tengwar text, to hieroglyphics, to Bliss. I don't
see what is stopping anyone from treating the entire set of U-000F0000 ..
U-0010FFFD as a set of metacharacters for representing (private use)
characters encoded out in some hypercodespace ("hyper" from the point
of view of the Unicode Standard).

Note there is the little technical glitch that as it stands now, 10646
specifies that U-000FFFFE, U-000FFFFF, U-0010FFFE, U-0010FFFF (as well
as any value on planes 1..14 ending in FFFE or FFFF) are not legal
for characters -- so John will have to adjust his hypersurrogates proposal
a bit to take that into account.

> someone who uses these beyond-unicode code points should use ucs-4 or
> utf-8. conversion into a variation of utf-16 should not be attempted.
> we should get iso to recommend or force any codes beyond U-0010ffff not to
> be used at all (is there not already a proposal for this?).

This is, indeed, in the works. UTF-32 (see Unicode Technical Report #19)
is an attempt to specify a 32-bit form of Unicode that is restricted
to U-00000000..U-0010FFFF. That TR is not yet approved, however, since
the UTC (and L2) have decided instead to take the route of making a
contribution to SC2/WG2 "to re-examine and redefine the UCS-4
encoding space to include only Planes 0..16 of 10646." This will
be pursued as an NP for a Technical Corrigendum to 10646.

That route would further seal the technical relation between the Unicode
Standard and ISO/IEC 10646 and would remove the single most nagging
interoperability problem between the two specifications.


> of course, everyone is free to create his or her own encoding... (i
> suggested one before :-)
> markus

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT