L2/99-056R

L2/99-056

L2/UTC response to JTC1 N5698:
Japanese National Body Recommendation to ISO/IEC JTC 1
Concerning the Activities of JTC 1/SC 2

February 11, 1999

Author: Ken Whistler

1. Japan’s concern seems predicated primarily on the concern for market relevance of the ongoing work in SC2 for 10646. Their first point implies that working on dead or almost dead ancient character sets has no market relevance. However, the very fact that the Unicode Consortium has seen fit to pursue some of these as of interest is a prima facie case for market relevance, since the Consortium is a group of market-driven entities. There is a market relevance to having complete solutions for customers, so that specialists in academia, libraries or government, and enthusiasts of various sorts with online presence can make use of off-the-shelf software, rather than having to depend on customized solutions that have interoperability problems.

2. There are a number of living scripts in use in Southeast Asia that are not yet included in 10646. However, it is incorrect to infer from the fact that they are not yet in the standard that they are not being given due attention and proper priority by SC2. WG2 has before it proposals for Cham, Philippine and various Tai scripts. It is aware of extensions to the Myanmar script required to handle Shan, Mon, Karen, and other minority languages. It has on its roadmap various other scripts such as Javanese, Lisu, Meitei, Kirat, and Lepcha. Most of these are simply awaiting enough expert input to ensure that WG2 does a proper job of encoding them. The block to progress is generally the availability of a sufficiently detailed proposal and information about the relevant experts and/or community to be consulted, rather than insufficient knowledge about how the scripts should be prioritized for encoding.

3. It is misleading to characterize the supply of ancient scripts as "virtually inexhaustible". WG2 also has a roadmap for Plane 1 of 10646-2, which catalogs and gives suggested positions for nearly all of the historic scripts that various experts are aware of (other than Han extensions, which are for Plane 2). The number of these historic scripts is fairly well-known, as is the approximate number of characters they will require for encoding. The overall scale of the encoding to be done is approximately the same as the number of characters that have already been encoded. While that is a large number, it is not an inexhaustible number. The main difficulty lies in gathering sufficient, detailed and authoritative information about historic scripts, rather than in the number of characters to be encoded.

4. As regards the issue of market relevance of historic scripts, or living minority scripts, for that matter, 10646, as the Universal Character Set, is in a unique position as a standard. Unlike the major world scripts (Latin, Cyrillic, Han, and so forth), which have obvious market relevance all around the world, historic and minority scripts often have more localized importance. While Japan, for instance, may find no market relevance for the pursuit of Glagolitic, it would be of obvious interest to Bulgaria or to many communities of scholars. Conversely, while Japan may view Cham, a minority script used in Vietnam, as of market and business relevance, Cham might be of no interest whatsoever to Brazil or Italy, for example. Since 10646 is intended as the one, universal international standard for characters, not merely as a standard that may only contain characters that everyone agrees are widely used everywhere, there seems little choice in the matter: 10646 must contain both Glagolitic and Cham under these circumstances. It is then up to market forces to drive the scope of implementations of the standard. The 10646 solution to a perceived lack of market relevance for some collection of characters in a particular part of the world is the subsetting mechanism—implementors can choose to implement only the parts of the standard that are relevant to their applications and markets. 10646 should be inclusive while allowing subsetting, rather than being viewed as a private club with an exclusionary bar to admission of more scripts, historic or living.

5. Japan’s second point suggests that "it will be quite difficult, in terms of expertise and resources, for JTC 1 to develop and maintain all these standards." First of all, there really is only one standard at issue here: ISO/IEC 10646. The relevant working group, WG2, has demonstrated time and again that it can rise to the task of dealing with complex script encoding, by inviting the participation of experts and affected nationally based organizations, as well as by its long-standing cooperation with the Unicode Technical Committee, which has gathered much additional information and expertise about the unencoded scripts of the world. The perceived burden on national bodies that have little interest in some of the historic and minority scripts currently being proposed for addition to the standard might be addressed by making it procedurally less onerous for them to engage in all the review and balloting of amendments for additions they do not care about. But suggesting to distribute the development and maintenance of 10646 outside of WG2 is more likely to cause interoperability problems than to avoid them.

6. Japan’s last point claims that the "basic concept of IS 10646 seems to be changing and ambiguous." This is not correct. From the start, the basic concept of IS 10646 has been as the Universal Character Set. What has been changing is the nature of the new proposals for scripts and other characters to be added to the Universal Character Set. Now that the major world scripts have been completed, WG2 is dealing with minority and historic scripts. And now that more and more implementations of 10646 are appearing, implementors have brought forward new requirements for new formatting or other special characters that were not part of the original set of such characters included in the first edition. This is exactly what should be expected of a Universal Character Set—it is the one character set that should include ALL characters needed in information technology. It is not proper for a group which got all of its characters encoded in the first round to turn around and say to latecomers, we’re sorry, but your format characters aren’t the kind that we want to use, so you’ll have to stay outside. WG2 works hard to ensure that new additions to the standard maintain technical consistency with the existing standard, so that existing data and implementations are not destabilized by additions; but within those constraints, WG2 must be open to new ideas and concepts that must be expressed as character codes for best implementation.

7. Finally, Japan claims that "the plane assignment is given for limited number of planes in rather ad hoc manner". This matter is a part of the scope and definition of Part 2 of ISO/IEC 10646, which is currently in working draft. It would seem that Japan has plenty of opportunity to question the assignment of those planes if it views them as ad hoc, since no characters have actually been assigned in them, and 10646-2 has not yet been balloted at all. There is plenty of room for specifying the scope of 10646-2 in more principled detail, if that is the issue. Japan states that "We must be provided with a clear guiding principles for the future extension of IS 10646." WG2 has already developed a quite detailed Principles and Procedures document (WG2 N 1502R, WG2 N 1876 and related documents) with a very clear roadmap for the future extension of IS 10646. This is in addition to the scope statement of the working draft for 10646-2 itself.