Re: What's the BMP being saved for?

From: Doug Ewell (
Date: Thu Mar 18 2004 - 11:33:27 EST

  • Next message: "Re: What's the BMP being saved for?"

    Arcane Jill <arcanejill at ramonsky dot com> wrote:

    > This probably is going to sound like a really dumb question, but ...
    > I'm curious. Why are characters being assigned codepoints > U+FFFF,
    > when there is still loads and loads of unused empty space below that
    > point. Is the BMP being saved for something? Are codepoints < U+010000
    > reserved for something of which I am unaware? If so, what? If not, why
    > are assignments being made up there in the astral planes?
    > By my calculations, the total number of currently existent Unicode
    > characters is < 0x10000, which means that - currently - ALL existent
    > Unicode characters could have been encoded in 16-bits. I can
    > understand wanting to make room for more than 0x10000 characters for
    > the future, but we don't need to be there yet (unless there's some
    > explanation with which I am unfamiliar), so I don't understand why
    > they don't just get assigned in ascending numerical order on a first-
    > come first-served basis.

    First, there are more than 96,000 characters assigned in Unicode 4.0, so
    they wouldn't all fit in the BMP even if dense-packed. You may be
    forgetting to count all the Han extensions.

    Second, there are indeed scripts for which space is reserved in the BMP.
    Take a look at the Roadmaps, available on the Unicode Web site. You
    will see that most of the BMP is already spoken for, even if the
    relevant scripts have not yet been approved or even formally proposed.

    Third, there is a benefit to attempting to organize scripts into
    more-or-less contiguous blocks, and that benefit is generally considered
    to outweigh the advantages of dense-packing the code space.

    Looking through the history pages on the Unicode site, one can find
    references to an early philosophy of "start with 0 and add the next
    character." Sounds like a neat idea, but in practice it is more useful
    to add Latin characters to a Latin block, Thaana characters to a Thaana
    block, etc. instead of having them strictly assigned in code-point order
    corresponding to the time they were approved.

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Thu Mar 18 2004 - 12:32:09 EST