Re: XML and ISO 10646 planes beyond the BMP

From: Martin J. Dürst (
Date: Tue Aug 19 1997 - 06:20:34 EDT

Misha has asked me to also send this to the Unicode list:

On Sun, 17 Aug 1997, Misha Wolf wrote:

> > > In that case, we have a choice of two numbers for the slot after
> > > the "160" below:
> > > [...]
> > > They are:
> > > 99999999, the highest integer that may be expressed using eight
> > > decimal digits, and
> > > 1113952, which allows us to utilise the entire available range
> > > of code points defined by The Unicode Standard.

These are indeed the two possibilities. They carry a slight difference
in meaning. The first basically says: We tried as much as we could,
we wanted full 31 bit, and that's as far as we could get. The second
one says: We don't think anything will/should go beyond UTF-16 anytime
soon, and we want to explicitly show that UTF-16 implementations are
fully conforming. I guess that the UTC and practically-oriented
people would be more happy with the second one, while principle-
oriented people would prefer the first one. The chance is that in
the HTML WG, there are more practically oriented people, while in
the XML WG, there are more principally oriented people. Nevertheless,
we should make both SGML declarations the same in this respect
(XML has all the casing stuff, which I wouldn't like to have
for HTML, so the declarations will otherwise be different).
As the allocation of codepoints to places outside UTF-16 is most
probably going to happen later than the extension of the SGML
start NAMELEN limit, and this is again probably going to happen
later than the next HTML revision, I do not think that using
1113952 now for both HTML and XML will cause problems.

So please let's go for it and finish this discussion.

Regards, Martin.

