RE: Latin w/ diacritics (was Re: benefits of unicode)

From: Peter_Constable@sil.org
Date: Tue May 22 2001 - 10:01:47 EDT


>11 Digit Boy asked:
>> Why does Unicode only have space for 1114112 glyphs?
>
>BMP = 256 × 256 = 65536
>HI_SURROGS = 1024
>LO_SURROGS = 1024
>
>UNICODE = BMP + HI_SURROGS × LO_SURROGS = 1114112

There are other ways to calculate:

17 * 65536 = 1,114,112
0x10FFFF + 1 = 1,114,112 (decimal)

But we really should do a little extra arithmatic to arrive at a more
useful number:

    65,536
 * 17
-----------
 1,114,112
- 2,048 (non-characters for surrogate code units)
- 34 (non-characters nFFFE and nFFFF for 0 <= n <= 16)
- 32 (non-characters FDD0 - FDEF)
-----------
 1,111,998

That's the currently number of usable codepoints in the Unicode codespace.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT