L2/04-366

Erratum for Unicode 4.0 and ISO/IEC 10646:2003

Source: François Yergeau and Asmus Freytag
Date: October 26, 2004

The Problem

The glyphs for U+2B00 NORTH EAST WHITE ARROW and 2B08 NORTH EAST BLACK ARROW both point to the upper left, whereas the glyphs for:
 
 2197    NORTH EAST ARROW
 21D7    NORTH EAST DOUBLE ARROW
 279A    HEAVY NORTH EAST ARROW
 27B6    BLACK-FEATHERED NORTH EAST ARROW
 2924    NORTH EAST ARROW WITH HOOK

all point to the upper right (and similarly for their NORTH WEST counterparts). Corresponding issues exist for 2B01 and 2B09.

All other arrow names are consistently descriptive when it comes to directionality.

Background:

The names have been consistently as published, from the very early drafts. The glyph arrangement changed when a switch was made from a DPRK-submitted ad-hoc font, to a more manageable font containing a larger glyph collection. This coalescing of fonts is a necessary evil. It can introduce these kinds of errors, but without it coalescing some of the small additions, the process would become immensely fragile. The glyph arrangement, as currently, matches what is in the 2100 block, which may be the source of the error.

The FPDAM contained the information as published, no-one detected the discrepancy until the preparation of the French translation, which is underway now.

There are strict limits on what can be done to correct these discrepancies:

Stability principle 2 prevents a change of character name.

Stability principle 4 prevents a change of glyph so as to change the identity of the character.

However, it is not clear in this situation where name and glyph disagree, whether the normative name or the informative glyph is the primary source of the 'identity' of the character.

Therefore, UTC has two options

Option 1:

Swap the glyphs of 2B00/2B01 and 2B08/2B09

Option 2:

Annotate the affected names as being 'non-descriptive'

Option 1 has the advantage that after making the change the standard is self-contained and consistent (except for the relative arrangement of the arrows within each block, which would then be different for 2190 block and the 2B00 block). This reduces the chance of user errors over the life of the standard; note that names annotations are only available in the nameslist.

Option 2 has the advantage that any fonts and mapping to other standards that were done 'visually' would remain correct. It also would not 'change' the standard, merely annotate it. However, as a consequence, the precedent could be taken to mean that names have less weight in determining the identity of characters than glyphs, even though the latter are deliberately not normative.


Precedents

There are precedents for both options in existing (and recent) errata. Where names are clearly incorrect (as in the case of typos) they have been annotated. Where names had a history of being constructed arbitrarily, such as for 2118, they have been annotated. However, the Tai Xuan Jing symbols 1D301 and 1D303 were swapped in a recent erratum, based on the fact that the wrong symbol was shown for the name (and that the glyphs were in variance to the original proposal).

Conclusion

The choice between these options depends on whether the committee thinks that the name or the glyph better defines the 'identity' of the character. In principle, that decision can and should depend on the nature of the discrepancy between character image and name.

Proposed Action

UTC should decide on its preferred option and communicate it to WG2 as a Corrigendum to be addressed in AMD1. (Given the nature of the discrepancy, any annotation must be carried by 10646 as well as Unicode).