RE: Surrogate support in *ML?

From: Brendan Murray/DUB/Lotus (
Date: Thu Sep 07 2000 - 10:47:33 EDT

Karlsson Kent - keka <> wrote:
> At the level of XML the number of bits is irrelevant.
> The "high and low surrogate" code points are excluded
> from being used as NCRs. A character (not UTF-16 code
> units) can be referenced by NCRs. See (XML) procuction 66
> (CharRef) and its well-formedness constraint (and
> production 2 (Char), though they missed to exclude a number
> of other non-character code points in that production).

I know that XML explicitly excludes surrogates. My question really refers
to what one can do to encode the non-BMP data in the new Han unification
data that will become part of 10646 and Unicode in the not too distant
future: is this huge block of characters regarded as irrelevant, or has
anyone proposed an encoding that can be used?


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT