L2/98-291
Title: US recommendations regarding procedures for character set registration
Source: NCITS/L2
Date: August 25, 1998
Action: Forward to SC2
Status: For the consideration of SC2
As NCITS/L2 considered the Applications
for Registration in SC2 documents N 3113-15, N 3125-38, and N
3140-41, it became apparent that many of these applications had
the same defects. The defects could be eliminated by defining
the requirements for character set registration more precisely,
and by strengthening the review process.
Recommendation 1:
The application for registration must
include an exact copy of the character set as it was originally
published, which will be retained by the registration authority
as its reference copy. A new version with redrawn glyphs, different
names and/or missing text may not be substituted.
Rationale for Recommendation:
The registration agency must possess
a true and complete copy of what is being registered, and
maintain it for reference purposes. (The reference copy may be
the actual publication, or a digitized version of it.)
The information available in a true
and complete copy of the character set being registered is vital
in determining ISO/IEC 10646 equivalents for its characters.
The redrawing of glyphs may introduce errors, and (worst case) create spurious characters.
Example:
N 3126 includes a sideways Q identified as a signature mark (a
feature of manuscript and early printed books). The image for
this character is different from that in ISO 5426-2:1996, the
standard being registered. The image in ISO 5426-2:1996 itself
is different from that in the British Library character set from
which the sideways Q came. (As the source, the British Library
forms should be considered the authoritative image).
Renaming of characters with ISO/IEC
10646 names introduces two problems. Firstly, it introduces an
implied mapping of characters in the set being registered to ISO/IEC
10646 characters. (Mapping to ISO/IEC 10646 characters should
be an explicit process, as recommended below.)
Secondly, if characters are renamed,
the name of a character as published is unavailable in
the reference copy. In the case of ISO or national standards,
the name as published is the name that was approved under
ISO or NSB procedures. If a character name needs to be changed,
this should be done by revising the standard, and not by unauthorized
renaming of characters during registration.
Redrawn versions of four character sets
(N 3125, N 3127-29) caused unnecessary work for the SC2 Secretariat
and national standards bodies. The character sets in question
are already registered, and the official versions published by
ISO include the assigned G0-G3 escape sequences. But because redrawn
versions were submitted, and the clause dealing with escape sequences
was omitted, the SC2 Secretariat had no way of knowing that the
four character sets were already registered.
Recommendation 2:
Mapping of characters in a particular
character set to ISO/IEC 10646 characters should be an explicit
process, and must be subject to review by qualified experts
Rationale for Recommendation:
Our concern here is to establish correct
and consistent correspondences between characters from source
character sets (those being registered) and ISO/IEC 10646. If
mapping is done by people who lack the appropriate expertise,
the result can be mappings with erroneous and contentious content,
as evidenced by many of the proposed registrations being reviewed.
Incorrect mappings can introduce havoc into the exchange of data.
Example:
N 3134 gives ISO/IEC 10646 mappings for Armenian punctuation marks
which do not agree with the mappings authorized by WG2 and published
in SC2/WG2 N1616.
Incorrect mappings or insufficient knowledge about the characters being mapped can result in unnecessary proposals.
Example:
SC2/WG2 N1745 is a proposal for the addition of six characters
that already exist in ISO/IEC 10646.
Many character sets have characters in common. When a character occurs in more than one character set, it is essential that its ISO/IEC 10646 mapping always be the same.
Example:
ISO 5426:1980 and ANSI/NISO Z39.47:1993 both contain the registered
trademark sign ®, but the registration applications for these
character sets (N 3125 and N 3138 respectively) have different
mappings for this character, to U+2122 and to U+00AE (the latter
is correct).
Recommendation 3:
To implement the preceding recommendations,
NCITS/L2 recommends that SC2 re-activate/ re-establish the REGISTRATION
ADVISORY GROUP (RAG) charged with examining proposed registrations,
reviewing and revising existing procedures for registration, advising
submitters of registrations on improvements of submissions, etc.
The RAG will operate under SC 2 charter, as an advisory group
to the Registration Authority. The RAG will consist of experts
nominated by SC2 member bodies and its liaisons.
Rationale for Recommendation:
There are formal registration
procedures in place based on ISO 2375. However, these procedures
were established before the publication of ISO/IEC 10646, so they
do not address the issue of including mapping information in a
registration.
The revival of the RAG
would ensure that the reference material obtained in the registration
process serves the needs of the global computer industry as fully
as possible. The RAG will also ensure the quality and reliability
of mappings established as part of the registration process.
NCITS/L2 and the Unicode Consortium urge SC2 to give this proposal immediate consideration, so that current flaws in the registration process may be remedied.