holes (unassigned code points) in the code charts

From: Stephan Stiller <stephan.stiller_at_gmail.com>
Date: Fri, 04 Jan 2013 02:36:33 -0800


There are plenty of unassigned code points within blocks that are in
use; these often come at the end of a block but there are plenty of
holes as well.

I have a cluster of interrelated questions:
1. What sorts of reasons are there (or have there been) for leaving
holes? Code page conversion and changes to casing by simple arithmetic?
What else?
1.1 The rationale for particular holes is not documented in the code
charts I looked at; is there documentation? (Yes, in some instances the
answer can be guessed.)
1.2 How is the number of holes determined? It seems like multiples of 16
are used for block sizes merely for practical reasons.
2. I notice that ranges are often used to describe where scripts are
found. Do holes have properties? Are the other block-related policies
that gives holes a certain semantics?
2.1 If not, how likely is it that Unicode assigns script-external
characters to holes?
2.2 If yes, how does the number of assigned code points differ, if holes
that are assumed to be filled only by certain types of characters are
2.2.1 Would this make much of a difference wrt the question (this comes
up from time to time it seems) of how much of Unicode will eventually
fill up?
3. Have there been "mistakes" wrt to hole assignment?

Received on Fri Jan 04 2013 - 04:41:43 CST

This archive was generated by hypermail 2.2.0 : Fri Jan 04 2013 - 04:41:45 CST