17 September 2003
To: Dave Michael, Chairman of the INCITS Standards Policy Board
CC: Jennifer Garner, Associate Director, Standards Programs, INCITS
Reference: Letter from Elaine Keown to ANSI
Thank you for forwarding Elaine Keown’s letter of 5 August 2003.
Ms. Keown states two major concerns: she is concerned about the procedure by which characters are encoded in ISO/IEC 10646, and she is concerned about the appropriateness of stakeholders involved in the encoding process. I’d like to clarify a few points that Ms. Keown may not be aware of. Hopefully, this will address both concerns to everyone’s satisfaction.
1. General procedure.
INCITS/L2 (and the Unicode Technical Committee or UTC) strives to have an open yet rigorous procedure for character encoding. It is our goal to serve the various linguistic and cultural communities with an appropriate character repertoire in ISO/IEC 10646; however, there is a process by which these repertoires are developed, both at the national level (L2) and at the international level (SC2/WG2). All are welcome to contribute, provided they follow these procedures.
This well-documented process for encoding characters is available via the Unicode website (http://www.unicode.org/pending/proposals.html) and the Principles and Procedures document available on the SC2/WG2 website (http://anubis.dkuug.dk/jtc1/sc2/wg2/docs/principles.html). This process is in place to ensure technical and linguistic continuity with the rest of the standard, and has been documented after years of experience working with proposals from numerous communities.
To date, neither SC2/WG2 nor L2 has received an encoding
proposal or contribution from Ms. Keown. She did communicate with Arnold Winkler,
former L2 chair, participated occasionally on the Unicode mail list, and
presented a paper on Hebrew at the International Unicode Conference in
2. Specific procedural issues with the Hebrew block.
Ms. Keown expressed several concerns about the construction and content of the Hebrew character block.
While developing the Hebrew repertoire, SC2/WG2 received contributions from Hebrew academicians and linguists. The initial Hebrew block was based on ISO/IEC 8859-8, and other characters have been added since then, following the character encoding procedure.
With regards to her other specific statements on Hebrew:
a. Coptic was moved. As Ms. Keown rightly comments, L2/UTC and SC2/WG2 policy is against moving characters once they are encoded (see http://www.unicode.org/standard/stability_policy.html). She then states that the Coptic block was moved. However, for Coptic, no characters were moved. Rather, 58 characters were added for Coptic at positions 2C80-2CBF (reference document WG2 N2611).
b. The Hebrew repertoire is not contiguous. This is not unique to Hebrew. The repertoires for Latin, Cyrillic and Khmer, for example, are broken into several non-contiguous blocks. The ideographs needed for Chinese, Japanese and Korean are also spread across multiple planes. Placement of the character repertoire in the standard however has no impact on software implementation. Future character additions will be allocated as appropriately as possible; however, there is no guarantee in the standard that characters of a particular writing system will be co-located.
c. Collation is broken by the repertoire. Unicode and ISO/IEC 10646 are encoding standards, not collation standards. The location of the characters in the repertoire does not determine or impact collation order for Hebrew or any other language/writing system—sorting is determined by the implementation. There are related standards which collate the repertoire of Unicode and 10646, however, they are not part of the encoding standard. Ms. Keown should review the Unicode Collation Algorithm (Unicode Standard Annex #10, http://www.unicode.org/reports/tr10/) and ISO/IEC 14651 (International String Ordering) for more information.
d. Hebrew subsets are poorly grouped. The current subsets in 10646 were developed based on input from user communities. There is a process by which new subsets can be defined. Again, SC2/WG2 has yet to receive a formal proposal from Ms. Keown, and welcomes any contributions concerning Hebrew subsetting.
e. Only 3 Hebrew script languages are partially covered. As noted earlier, there is a formal process for encoding characters. If Ms. Keown has knowledge of additional scripts needed for encoding Hebrew, we welcome her contributions.
The block is missing
symbols needed for
g. Some symbols are conflated and need semantic differentiation. We have yet to receive any formal proposals on the need to differentiate these symbols from Ms. Keown; again, a proposal which follows the submission guidelines is welcome.
Ms. Keown raised a concern about the decision makers in the character encoding process at the national level. She may find the following interesting:
I hope that it is clear from the above that INCITS/L2 engages in a character encoding process that is open to all interested stakeholders. In addition, this process is rigorous enough to meet the linguistic and cultural criteria of a community and provide an interoperable, international standard that works for global software.
Please feel free to contact me should there be any questions or comments.
With best regards
Chair, INCITS/L2 (Character Sets and Internationalization)