Re: What's the BMP being saved for?

Date: Thu Mar 18 2004 - 11:58:24 EST

  • Next message: "Re: Investigating: LATIN CAPITAL LETTER J WITH DOT ABOVE"

    Arcane Jill scripsit:

    > Why are characters being assigned codepoints > U+FFFF, when
    > there is still loads and loads of unused empty space below that point.

    In fact the BMP is currently 87.5% full. When the 32 remaining blocks
    currently shown on the Roadmap are completed, it will be almost 99% full.

    > Is the BMP being saved for something? Are codepoints < U+010000
    > reserved for something of which I am unaware? If so, what? If not, why
    > are assignments being made up there in the astral planes?

    Supplementary codepoints are used for characters that are judged to be
    of very low overall frequency, or that are part of very large character
    repertoires, or that are used only in obsolete scripts.

    > By my calculations, the total number of currently existent Unicode
    > characters is < 0x10000, which means that - currently - ALL existent
    > Unicode characters could have been encoded in 16-bits.

    Actually not. Unicode 4.0 contains 96447 graphic, format, and control
    characters, almost half again as large as the BMP. In addition, there
    are currently 139582 private-use characters, reserved noncharacters,
    and surrogate codepoints, of which 8482 are on the BMP.

    > I don't understand why they don't just get
    > assigned in ascending numerical order on a first-come first-served basis.

    Partly for administrative convenience, partly because keeping related
    characters together allows character-property tables to be cleverly

    The Imperials are decadent, 300 pound   John Cowan <>
    free-range chickens (except they have
    teeth, arms instead of wings and
    dinosaurlike tails).                        --Elyse Grasso

    This archive was generated by hypermail 2.1.5 : Thu Mar 18 2004 - 12:54:24 EST