Re: Corrigendum #9

From: Philippe Verdy <>
Date: Sun, 1 Jun 2014 10:28:29 +0200

Ok then, the definitions still dors not say that blocks cannot be split (in
fact it has already occured many time across versions by reevaluating the
need for new blocks and for desifying the BMP, up to the point that
sometime a single addition in the same script required allocating columns
in multiple subblocks as small as a column of 16 code points).

Blocks are in fact artefacts of the encoding process the y are previsional
until the characters needed are effectively allocated. Later any unused
area may be reallocated to another block.

On the BMP for example there remains a quite large area in a block
initially described for supplemental arrows that could host a new full
alphabetic script (most probably one of the remaining Indic or African
modern scripts still to encode) or symbols used in common softwares or
devices for their UI and its documentation (such as the window
minimize/maximize/close button or resize corner, or refresh button, or
microphone symbol to initiate a vocal talk, or the radio wave symbol for
accessing a wireless network), or conventional symbols for accessibility
devices, marks of dangers/hazards or restrictions/prohibitions that could
be used as widely as currency symbols (encoded often in emergency but
isolately, unlike other symbols coming in small related groups; if these
collections are large like emoticons/emojis they'll go directly in the SMP).

Blocks are not immutable in size, even if they keep their initial position
(because allocations in blocks start by the leang position, skeeping only a
few entries that were balloted for possible later allocation to the same
script, or for former proposals of characters that were balloted in favor
of unification to another character, or just to align the block with the
layout of another legacy encoding chart, or because the initial beta fonts
submitted to support the script allocated other characters that were not
approved and fonts were not updated to use a new layout).

May be in some future we will see a few more allocations made in the BMP
using half columns (this is *already* the case at end of the BMP where a
single column is split in two parts, containing Armenian presentation
forms, and Hebrew presentation forms for Yiddish...), or filling some
random holes for which it is definitively decided that the initial
reservations in the roadmap will never be used for the initially intended

2014-06-01 8:20 GMT+02:00 Asmus Freytag <>:

> On 5/31/2014 10:06 PM, Philippe Verdy wrote:
>> I've not proposed to move these characters elsewhere (or ro reencode
>> them), why do you think that?.
>> I just challenge your statement that a block cannot be discontinuous,
> Well, go ahead and challenge that.
> As implemented in the current nameslist and file blocks.txt a block would
> have this definition. "A block is a uniquely named, continuous,
> non-overlapping range of code points, containing a multiple of 16 code
> points, and starting at a location that is a multiple of 16."
> Per chapter 3 the definition of the property block is given in Section
> 17.1 (Code Charts) - which contains no actual definition, only tells you
> how they are used in organizing the code charts, so, effectively, a block
> is what blocks.txt (and therefore the names list) say it is. The way blocks
> are assigned, has been following the empirically derived definition I gave
> above, and at this point, the production process for the code charts has
> some of these restrictions built in.
> Chapter 3 calls blocks an enumerated property, meaning that the names must
> be unique, and blocks.txt associates a single range with a name, in
> concurrence with the glossary, which says blocks represent a range of
> characters (not a collection of ranges). Likewise, changing blocks to not
> starting at or containing multiples of 16 code points (sometimes called a
> "column") is equally not in the cards - it would break the very production
> process for chart production. The description of how blocks are used does
> not contemplate that they can be mutually overlapping, so that becomes part
> of their implicit definition as well.
> There's reason behind the madness of not providing an explicit definition
> of "block" in the standard. It has to do with discouraging people from
> relying on what is largely an editorial device (headers on charts).
> However, it does not mean that arbitrary redefinition of a block from a
> single to multiple ranges is something that can or should be contemplated.
> So, the chances that UTC would agree to such changes, even if not formally
> guaranteed, is de facto nil.
> A./

Unicode mailing list
Received on Sun Jun 01 2014 - 03:31:59 CDT

This archive was generated by hypermail 2.2.0 : Sun Jun 01 2014 - 03:32:13 CDT