L2/04-219

Registry for Ideographic Variation Sequences

Eric Muller, Adobe Systems Inc.
June 6, 2004

1.  Motivation
2.  Overview
3.  Ideographic Variation Database
4.  Review process
5.  Registration of collections
6.  Registration of variation sequences
Document History

1. Motivation

Characters in the Unicode Standard can be represented by a wide variety of glyphs. Occasionally the need arises in text processing to restrict or change the set of glyphs that are to be used to represent a character. In special circumstances, this restriction needs to be expressed in plain text rather than by font selection or some other rich text mechanism. The Unicode Standard accomodates those circumstances with variation selectors: a scalar value for a graphic character can be followed by the scalar value of a variation selector to identify a restriction on the graphic character. The combination of a graphic character and a variation selector is known as a variation sequence (See TUS 4.0, section 15.6).

In the case of Han ideographs, it is difficult to build a single collection of variation sequences that can satisfy all the needs of the users. The requirements from scholars, governments and publishers are too different to be accomodated by a single collection. Nevertheless, it is desirable that the meaning of a given variation sequence be unique, to avoid confusion. To accomplish this, the Ideographic Variation Database is used to register collections of variation sequences, and to ensure that different collections do not use the same variation sequences.

2. Overview

Registration is limited to sequences made of a base character with the Ideographic property and one of the variation selectors in the range U+E0100 to U+E01EF. There is no guarantee that two sequences involving the same variation selector on different base characters have any relationship nor will a variation selector be designated, independantly of any base, for any purpose. Should there be requests to register more than 240 sequences involving the same base character, the Unicode Consortium will seek the encoding of additional variation selectors, and make those available for registery of Ideographic Variation Sequences.

To guarantee the stability of data encoded using registered variation sequences, those sequences are never removed or reassigned.

Ideographic Variation Sequences are subject to the usual rules for variation sequences: unregistered sequences should not be used, and registered sequences should be used according to their intent. Furthermore, variation selectors are default ignorable. This implies that registrants should carefully consider whether a variation sequence, when viewed as a possible set of glyphs, is indeed a subset of the glyphs which are acceptable for the base character alone.

Registration of a collection does not imply suitability for any purpose. The usefulness of a given variation sequence, and the usefulness of a collection as a whole depends to a large extent on their use. Registrants are encouraged to describe the intent and modalities of their collections.

The registration process has two main steps:

While the process imposes few strict requirements, it also strongly encourages interested parties to provide additional information and to engage in a review process with the user community.

The procedures presented in this document attempt to cover the most usual situations. However, the Unicode Consortium reserves the right to clarify or otherwise change the procedures at any time.

3. Ideographic Variation Database

The Ideographic Variation Database is part of the Unicode Character Database, and is made of two files:

IVD_Collections.txt is a semicolon delimited file with one line per collection. The fields are:

IVD_Sequences.txt is a semicolon delimited file with one line per variation sequence. The fields are:

Both files follow the usual convention for comments and whitespace: the portion of a line starting at any '#' is ignored, white lines are ignored, and spaces are not significant.

The only purpose of the collection identifier is to tie the two files, and avoid repeating the full URL in IVD_Sequences.txt. It is internal to the IVD database.

4. Review process

Prior to the registration of a collection or of a set fo variation sequences, registrants are strongly encouraged to engage with the user community in a review process.

The registrant should post on his web site the description of the collection and/or sequences and announce its availabitility on the unicode@unicode.org mailing list, indicating where feedback can be sent. The registrant is strongly encouraged to consider this feedback and make appropriate changes to the description or content of the collection.

The suggested review period is one month, after which the registrant can submit the application.

5. Registration of collections

Any party can register a collection. The application for registration must have:

Option:

Registration of a collection is subject to a fee. Full members of the Consortium can register xx [suggestion: 2] collections per membership year at no fee. For additional collections, and for other parties, the registration fee is $xx [suggestion: $100].

Upon reception of a complete application and of the applicable fee, the registrar will assign a collection identifier (respecting as much as possible the suggested identifier), and add the collection to the Ideographic Varation Database.

Owners of collections can change the designated representative at any time by notifying the registrar. They can also change the URL of the web site they maintain by notifying the registrar. Ownership can be transferred to another party by notifying the registrar.

Owners of collections are strongly encouraged to make the description of their collection as complete as possible and widely available, via the URL given in the registration.

The Unicode Consortium may register collections.

6. Registration of variation sequences

Only the representative for a collection can register variation sequences in that collection. The application must have:

Option:

Registration of a variation sequence is subject to a fee. Full members of the Consortium can register xx [suggestion: 2,000] sequences (in any combination of collections they have registered) per membership year at no fee. For additional sequences, and for other parties, the registration fee is $xx [suggestion: $1].

Owners of collections are strongly encouraged to make the description of their variation sequences as complete as possible and widely available, via the URL given in the registration. In particular, the URL should give access to representative glyphs for each variation sequence.

Upon reception of a complete application and of the applicable fee, the registrar will assign a variation sequence, and update the IVD, for inclusion in the next release of the Unicode Character Database.


Document History

Author: Eric Muller

RevisionDateComments
1June 6, 2004

Initial version