ISO/IEC JTC1/SC 2/N 3273
ISO/IEC JTC1/SC 2/WG 2N 2009
Title: WG2 Charter Revisited
Action: For review and adoption by SC2
Distribution: Members of JTC1/SC2 and JTC1/SC2/WG2
During its meeting #36 in Fukuoka WG2 met and created the attached draft of a charter document which is intended to serve as an update to the WG2 program of work. WG2 requests SC2 to review and approve this document as an update to WG2ís program of work and forward it to JTC1 in support of the JTC1/SC2 business plan.
Coded character sets, especially ISO/IEC 10646, are the foundation onto which all the modern text-based Internet and web protocols are built. They are also one of the fundamental building blocks of for the cultural and linguistic adaptability strategy of JTC1. Coded character sets play an equally fundamental role in all other aspects of information technology from data warehousing to electronic commerce and publishing. ISO/IEC 10646, as the Universal Characters Set takes a special place in this. This document tries to raise the visibility of coded character set standards and illuminate the special role and importance of ISO/IEC 10646 by answering these questions about ISO/IEC 10646 and the program of work of WG2:
What constituents is ISO/IEC 10646 serving?
What are their requirements for ISO/IEC 10646?
What is the appropriate measure of market relevance of ISO/IEC 10646?
What has been achieved up to now and what is left to do?
What level of effort is required and when will the work be completed?
What development and maintenance procedures are most likely to be successful?
Constituents of ISO/IEC 10646
It is useful to distinguish between the direct users of the standard and its ultimate beneficiaries.
The direct users of the standard are
The ultimate beneficiaries of the standard will be end users able to access data from a variety of sources regardless of the location of source or receiver or the language that is employed.
Requirements for ISO/IEC 10646
The above named constituents require a character set that is practical and implementable, flexible and expandable, is convertible to existing standards and fulfills these detail requirements:
Universality: ISO/IEC 10646 must provide universal coverage
Single Coherent Architecture
Flexible and expandable
Supporting the preservation of world cultural history poses a special burden on ISO/IEC 10646 in that over a long time it must preserve usersí investment in their data encoded using this standard. Among other things this means that the intended life span for 10646 is much longer than for other types of IT standards. It is therefore appropriate that the standard leaves ample room for future expansion, even though the identified needs at the moment only occupy a fraction of the available space. At the same time, centralized maintenance is required to preserve the coherence of the architecture.
Measurement of market relevance
ISO/IEC 10646 is called the Universal character set for a reason: it is intended to cover all the scripts of the world. This view is widely shared by the industry and the user communities, both of which participate both directly and indirectly in the work of WG2.
The very fact that the Unicode Consortium has seen fit to pursue the creation of a universal character set in cooperation with JTC1/SC2/WG2 is a prima facie case for the market relevance of the continuing work of WG2, since the Consortium is a group of market-driven entities. There is a market relevance to having complete solutions for customers, so that users of smaller language communities, specialists in academia, libraries, or government, can use off-the-shelf software, rather than having to depend on customized solutions that have interoperability problems.
Any wish to create rational limitations to additions of new characters or scripts must not lead to arbitrary barriers for entry. The market relevance cannot be determined on the basis of an individual script as long as the industry upholds the above stated stand on universality.
ISO/IEC 10646-1 and all the amendments and corrigenda (to be consolidated as 2nd edition), deals with almost all the scripts that are in current use. It is referenced and used by standards used in key emerging technologies such as the world-wide-web, which is key for the world wide IT integration with significant benefits to all aspects of world trade and lives of people.
ISO/IEC 10646 is in a unique position in that it is intended as the one universal international standard for characters, not merely as a standard that may only contains characters that everyone agrees are used widely everywhere. The content of ISO 10646 therefore includes minor and historic scripts.
The possible content of ISO/IEC 10646 is large, but it is not unbounded. It can be grouped as follows into characters for
In the list above item 1 is largely completed, items 2 and 3 are partially completed with active progress being made on completion, while item 4 has seen some initial progress and a roadmap for further work is in place. Item 5 is being pursued on a case by case basis in response to specific requests from relevant user communities.
Specific work items may intermix characters from more than one group. A recent example is the proposed addition of CJK ideographs containing characters for modern use in Hong Kong and Taiwan combined with characters of literary and scholarly importance.
Since 1991 WG2 has actively worked with the Unicode Consortium to synchronize ISO/IEC 10646 with the Unicode Standard. This effort has been very successful, culminating in the current effort of synchronous publication of the Unicode Standard, Version 3.0 and the second edition of ISO/IEC 10646 containing the same repertoire. Without such synchronization there would have been two parallel standards resulting in half the number of implementations overall.
ISO/IEC 10646 has been widely adopted since its first published edition in 1993. Implementations include operating systems, applications, internet browsers, programming languages and development tools. In this context, RFC 2277 has established that all future internet protocols must be able to support ISO/IEC 10646.
Anticipated effort and completion of remaining work
WG2 has made a survey of all known scripts and notational systems and created a roadmap for the completion of the basic and supplementary repertoire of characters in ISO/IEC 10646, including assigning a priority to each script. As a result of this initial work, it can be concluded that the scope of the project is not open ended, but bounded.
According to this survey, the remaining work breaks down as follows:
This contains the majority of items 1 and 2. Anticipated completion by 2002
This contains mainly items 3 and 4, but also those characters from items 1 and 2 that could not fit into the basic repertoire.
The supplementary repertoire is further divided by priority
The number of the historic scripts contained in the supplementary repertoire is well known, as is the approximate number of characters they will require for encoding. The overall scale of the encoding to be done is approximately the same as the numbers that have already been encoded. While that is a large number, it is not an inexhaustible number. The main effort for historic scripts lies in gathering sufficient, detailed and authoritative information about them, rather than in the number of characters to be encoded.
Development and maintenance procedure
User communities such as IETF have time sensitive requirements that are best met by an incremental development process, using amendments. Less time critical work will be bundled into collections.
The development process of ISO/IEC 10646 is by nature additive. Characters will neither be changed, nor renamed or removed once they are coded. Doing otherwise would obsolete usersí investment in existing data and implementations in support of such data.
WG2 works hard to ensure that new additions to the standard maintain technical consistency with the already published edition of the standard, so that existing data and implementations are not destabilized by additions. But within these constraints WG2 must be open to new characters and concept. However, preserving the architectural coherence of the standard requires centralized ownership and maintenance. Implementersí experience in the past is that registration processes for code character sets have led to technically incoherent results and fragmented solutions.
WG2 maintains a Procedures and Guidelines document that specifies the process used to encode characters and scripts. This process is one of distributed and cooperative development, where the original submitters, experts from interested national bodies and liaison organizations, particularly the Unicode Consortium, take an active role in resolving many of the technical issues in a given proposal and work on maturing it, before it is processed by WG2. During this phase of a proposal, input of outside experts from the relevant user communities is pursued.
The efficiency of this cooperative process has increased in recent years. As a result, once the current period of particularly heightened activity surrounding the second edition of ISO/IEC 10646 is past, it is expected that the frequency and length of WG meetings will be reduced, but work will continue at a steady rate.
JTC1 considers linguistic and cultural adaptability a strategic cornerstone of IT. The lack of such adaptability is recognized by JTC1 as the ultimate trade barrier. The universal character set, ISO/IEC 10646, is the essential fundamental building block of this cultural and linguistic adaptability.
Therefore, WG2 has a special responsibility in meeting the stated requirements of the constituents of this universal character set to enable them to meet that objective.