Re: XML and ISO 10646 planes beyond the BMP

From: Misha Wolf (misha.wolf@reuters.com)
Date: Sun Aug 17 1997 - 11:28:12 EDT


Liam Quin wrote (to the xml sig list):

> On Sat, 16 Aug 1997, Misha Wolf wrote:
> >
> > Though SGML's 8-digit limit may be under review, I don't think
> > we can wait for that process to run to completion, and so believe
> > we must treat it as an absolute constraint.
>
> I think that's certainly true for XML 1.0; for a future release the
> limit can probably be relaxed.
>
> > In that case, we have a choice of two numbers for the slot after
> > the "160" below:
> > [...]
> > They are:
> > 99999999, the highest integer that may be expressed using eight
> > decimal digits, and
> > 1113952, which allows us to utilise the entire available range
> > of code points defined by The Unicode Standard.
>
> If ISO 10646 is the base character set, the higher number must be
> used. If 16-bit Unicode is all this is used, a lower number is fine.
> Remember, though, that the number represents a contiguous range.
> If full 32-bit characters are needed, the largest numerical
> charater reference will be �, and that would cot be
> be expressed in an SGML declaration today.

Well ...

- While Unicode characters are 16-bits wide when encoded using the
   native Unicode encoding scheme (aka UTF-16), the Unicode coding
   space is, as from Unicode 2.0, somewhat wider. It covers 17 planes
   of 64K characters.

- While the ISO 10646 coding space is theoretically 31 (not 32) bits
   wide, it is my understanding that ISO/IEC JTC1/SC2/WG2 has decided
   not to encode characters beyond the 17 planes covered by Unicode.

So it makes no difference whether we quote ISO/IEC 10646 or Unicode :~)

> > BTW, the value 99999999 may not be absolutely accurate. It may be
> > that it has to be reduced [...] so that the highest numeric
> > character reference (NCR) does not exceed 99999999. Please could
> > one of the SGML experts advise on this.
>
> No, that's not correct. Within the document, the SGML NAMELEN has
> been raised, so larger numbers can be used. The 8-digit limit is only
> within the SGML declaration. It's also only the length of literal
> numbers, not the values -- e.g. you could not write
> 160 1000 0000000000160
> because the leading 0s make the 3rd number too long.
>
> So 99999999 is OK.

Thanks for explaining that.

------------------------------------------------------------------------
Misha Wolf Email: misha.wolf@reuters.com 85 Fleet Street
Standards Manager Voice: +44 171 542 6722 London EC4P 4AJ
Reuters Limited Fax : +44 171 542 8314 UK
------------------------------------------------------------------------
Eleventh International Unicode Conference, Sep 2-5 1997, www.unicode.org

------------------------------------------------------------------------
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of
Reuters Ltd.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT