RE: holes (unassigned code points) in the code charts

From: Whistler, Ken <>
Date: Fri, 4 Jan 2013 18:24:23 +0000

Stephan Stiller continued:

> Occasionally the question is asked how many characters Unicode has. This
> question has an answer in section D.1 of the Unicode Standard. I
> suspect, however, that once in a while the motivation for asking this
> question is to find out how much of Unicode has been "used up". As
> filling in holes would be dispreferred, it might be interesting to know
> how much of Unicode has been filled if one counts partially filled
> blocks as full. I have no reason to disagree with the (stated and
> reiterated) opinion that our codespace won't be used up in the
> foreseeable future, but it's simply a fun question to ask.

The editors maintain some statistical information relevant to this fun question at:

Feel free to reference those fun facts the next time Unicode comes up in conversation at the bar. ;-)

There have been a few notable examples where particularly egregious examples of holes in blocks that seemed unlikely to be filled with like material in the future were "reprogrammed" as it were, and grabbed for the encoding of unrelated sets of characters. The most notable of these is the range U+FDD0..U+FDEF in the middle of the Arabic Presentation Forms-A block. There was a clear consensus in both committees that nobody wanted to add any more encodings for presentation forms of Arabic ligatures. So, when a need arose to add another range of noncharacters, the UTC simply decided that the otherwise unused range U+FDD0..U+FDEF could serve for that, while not requiring the addition of a new two-column block that could otherwise be used on the BMP. There are several symbol blocks on the BMP which have also had a somewhat colorful and creative history of "hole-filling" over time.

Received on Fri Jan 04 2013 - 12:28:56 CST

This archive was generated by hypermail 2.2.0 : Fri Jan 04 2013 - 12:28:57 CST