From: Doug Ewell (firstname.lastname@example.org)
Date: Thu Mar 18 2004 - 11:33:27 EST
Arcane Jill <arcanejill at ramonsky dot com> wrote:
> This probably is going to sound like a really dumb question, but ...
> I'm curious. Why are characters being assigned codepoints > U+FFFF,
> when there is still loads and loads of unused empty space below that
> point. Is the BMP being saved for something? Are codepoints < U+010000
> reserved for something of which I am unaware? If so, what? If not, why
> are assignments being made up there in the astral planes?
> By my calculations, the total number of currently existent Unicode
> characters is < 0x10000, which means that - currently - ALL existent
> Unicode characters could have been encoded in 16-bits. I can
> understand wanting to make room for more than 0x10000 characters for
> the future, but we don't need to be there yet (unless there's some
> explanation with which I am unfamiliar), so I don't understand why
> they don't just get assigned in ascending numerical order on a first-
> come first-served basis.
First, there are more than 96,000 characters assigned in Unicode 4.0, so
they wouldn't all fit in the BMP even if dense-packed. You may be
forgetting to count all the Han extensions.
Second, there are indeed scripts for which space is reserved in the BMP.
Take a look at the Roadmaps, available on the Unicode Web site. You
will see that most of the BMP is already spoken for, even if the
relevant scripts have not yet been approved or even formally proposed.
Third, there is a benefit to attempting to organize scripts into
more-or-less contiguous blocks, and that benefit is generally considered
to outweigh the advantages of dense-packing the code space.
Looking through the history pages on the Unicode site, one can find
references to an early philosophy of "start with 0 and add the next
character." Sounds like a neat idea, but in practice it is more useful
to add Latin characters to a Latin block, Thaana characters to a Thaana
block, etc. instead of having them strictly assigned in code-point order
corresponding to the time they were approved.
This archive was generated by hypermail 2.1.5 : Thu Mar 18 2004 - 12:32:09 EST