Re: How many characters?

From: Kenneth Whistler (
Date: Wed Nov 23 2005 - 13:30:23 CST

  • Next message: Cary Karp: "RE: Hebrew script in IDN"

    Andrew noted:

    > By my calculations the correct values for 4.1 are:

    After double-checking all my sources and Peter's figures
    as well, I concur with Andrew's correction, and have updated
    all the master sources accordingly. You can take the
    following figures as correct for Unicode 4.1:

    > Unicode 4.1:
    > 51640 graphic characters assigned (BMP)
    > 35 format control characters assigned (BMP)
    > 65 control characters assigned (BMP)
    > 6400 private use characters assigned (BMP)
    > 2048 surrogate code points designated (BMP)
    > 34 noncharacter code points designated (BMP)
    > 5314 reserved code points (BMP)
    > 45875 graphic characters assigned (supplementary planes)
    > 105 format characters assigned (supplementary planes)
    > 131068 private use characters assigned (supplementary planes)
    > 32 noncharacter code points designated (supplementary planes)
    > 871496 reserved code points (supplementary planes)
    > ------------------------------------------------------------------
    > 1114112 code points altogether

    Regarding 5.0:

    > Based on the latest publicly available version of the 5.0 UCD data, I
    > get the following figures for 5.0. My figures have two less BMP and
    > two more SMP characters than Ken's figures, but I haven't
    > cross-checked with N2991 yet (N2991 states there are 1,359 new
    > characters, but this must be a typo for 1,369), so I'm not sure who's
    > correct.

    I had misapportioned 2 character additions betwen the BMP and SMP.
    About N2991, the *actual* count in that document is 1365. The
    discrepancy for Unicode 5.0 comes from 4 additions based on PDAM3, rather than
    FDAM2. The correct deltas are:

    BMP Graphic additions from FDAM 2: 336
    SMP Graphic additions from FDAM 2: 1029
    BMP Graphic additions from PDAM 3: 4

    Given that understanding, Andrew's breakdown for Unicode 5.0
    (which is still pre-release, so could conceivably change in
    some respects yet) is currently correct as listed below.


    > Unicode 5.0:
    > 51980 graphic characters assigned (BMP)
    > 35 format control characters assigned (BMP)
    > 65 control characters assigned (BMP)
    > 6400 private use characters assigned (BMP)
    > 2048 surrogate code points designated (BMP)
    > 34 noncharacter code points designated (BMP)
    > 4974 reserved code points (BMP)
    > 46904 graphic characters assigned (supplementary planes)
    > 105 format characters assigned (supplementary planes)
    > 131068 private use characters assigned (supplementary planes)
    > 32 noncharacter code points designated (supplementary planes)
    > 870467 reserved code points (supplementary planes)
    > ------------------------------------------------------------------
    > 1114112 code points altogether
    > Andrew

    This archive was generated by hypermail 2.1.5 : Wed Nov 23 2005 - 13:32:38 CST