L2/99-382

Comments to accompany a U.S. NO vote on JTC1 N5999, SC2 N3393, New Work item proposal (NP) for an amendment of the Korean part of ISO/IEC 10646-1:1993.

December 9, 1999

 

1.   The U.S. strongly objects to this New Work item proposal (NP), because of its obvious potential to cause a destabilization of a major, implemented portion of the International Standard 10646.  The stated intent of the NP is to completely reorganize and recode the Korean characters in the standard. That intent runs directly counter to the stated policy of SC2/WG2 not to move, delete, or rename any character in the International Standard once standardized—a policy which is also strongly supported and maintained by the Unicode Technical Committee in its parallel development of the industry Unicode Standard, in synchronization with the International Standard, ISO/IEC 10646.

 

The last reorganization of Korean in ISO/IEC 10646 was the result of Amendment 5. That reorganization caused major disruptions to implementations of the standard, occasioned much controversy at the time, and has had lingering effects on current implementations.  The result of that amendment (which some in the character standards community still consider a “fiasco”) was to harden the resolve of all parties involved to never, ever allow such a reorganization of the standard to occur again. It would be exceedingly damaging to the standard, and would easily accrue damages extending to the multi-millions of dollars among the implementing vendors.  Accordingly, it is the firm view of the U.S. committee that the experience of Amendment 5 should not be taken as setting a precedent for periodic reevaluation and reorganization of the Korean encoding, but rather as the demonstration case that such reorganization is disruptive and damaging and should never be allowed to occur again in the standard.

 

Based on this consideration alone, the U.S. vote could not be turned to a YES by anything less than a near-complete withdrawal of all of the contents of this NP and its substitution by a request merely for the encoding of some repertoire of new characters not already encoded in the standard.

2.   Regarding the specific parts of the NP, the U.S. has the following comments.

 

2.1. The amendment of the Korean jamo proposed in Annex A of JTC1 N5999 (“Korean Character Combining Alphabet”) would constitute a rearrangement (and hence re-encoding) of existing encoded characters.  That is clearly unacceptable. The NP “proposed to add 8 characters in the character subset of Korean combining alphabets (1100-11FF)...” However, that claim of 8 additional characters is correct only in terms of the accounting of total assigned code points in the block.

 

An examination of Annex A in some detail shows that more new characters than 8 are being proposed, and some encoded characters already existing in that block of ISO/IEC 10646-1 are not accounted for in the new proposal. Just considering the initial (choseong) part of the table, it appears that Annex A proposes 6 new characters (at positions 1113, 1114, 1117, 1120, 1128, and 112A), and proposes to omit two characters already encoded in 10646:

U+1134 and U+1146.

The only way to salvage this part of the proposal would be to make a new proposal (accompanied by the WG2 Summary Proposal Form) containing the exact list of new conjoining jamo characters being proposed, along with their identities and exemplifications in print. In particular, there would need to be justification for why such proposed entities as Annex A 1113 ... INITIAL CONSONANT HYOBADAKSORI-NIUN should be considered a separate character from the existing U+1102 HANGUL CHOSEONG NIEUN, and so forth.

2.2. The proposal in JTC1 N5999 suggests a rearrangement of all of the 11172 Korean syllables already encoded in 10646, to follow the jamo order of KPS 9566-97. This suggestion is also completely unacceptable; it would destabilize existing implementations.  The suggestion to negotiate some kind of “third proposal” would make no difference—it would still represent a destabilization of already standardized characters.

 

IS 10646 has never been under any obligation to strictly follow the ordering of characters in any particular national standard.  Easily available transcoding technology exists for converting data expressed in IS 10646 for interchange purposes, into any particular national standard, including KPS 9566-97. The U.S.  Committee would simply invite the Standardization Committee of the DPRK to publish, in machine-readable form, its transcoding table between the existing IS 10646 and KPS 9566-97. That would facilitate conversion and eliminate the “confusion and difficulty in the information interchange.”

For ordering purposes, alternate technologies also exist. The appropriate way to adapt data representing in the UCS to local requirements for ordering is either to A) transcode the data into the local character encoding and sort conventionally using that local encoding, or B) make use of the collation key generation mechanisms described in the Unicode Collation Algorithm and in FCD3 14651 International String Ordering.

2.3  The proposal requests an additional 80 symbolic characters be encoded in 10646. This set is claimed to consist of characters in KPS 9566-97 but not in ISO/IEC 10646-1.

 

The actual set is a very mixed collection, and includes some characters that are already encoded (U+2601, U+2607, fractions) with slightly different glyphs, some characters included in other proposals under consideration by WG2 (arrows, circled numbers), some “characters” that would probably not meet the criteria of WG2’s character/glyph model for appropriateness as encoded characters (apostrophe off centre), emphasized Korean Hangul syllables intended to spell out the particular persons’ names Kim Il Sung and Kim Jong Il (“Kim” encoded twice, and “Il” encoded twice), and yet another representation of the Korean jamo alphabet (Korean Compatibility Alphabet XYZ) with no particular justification presented.

While a number of these 80 symbols may in fact be acceptable characters for inclusion in 10646, the appropriate mechanism to use for their consideration would, once again, be submission of the repertoire, along with a WG2 Summary Proposal Form, with detailed explication and justification for the proposed characters, including citations in printed documents. Based on that information, other National Bodies could then provide feedback and commentary regarding which, if any, of these characters meet the technical criteria for inclusion in 10646. For this kind of small repertoire, no NP is really necessary.

2.4. The proposal in JTC1 N5999 proposes adding the encoding representations of each of the ideographs in KPS 9566-97 to the published code table of CJK Unified Ideographs in 10646. The U.S. considers this unnecessary. It is clearly not feasible for the publication of the second edition of ISO/IEC 10646-1, which is nearly complete. But even in the future, the main function of printing the encodings for various CJK sources in the standard itself is to establish the normative identity of characters by their values in the official source sets for the unification. Among other things, this clarifies the instances where the source set separation rule was invoked, thereby preventing the unification of a character that would otherwise have been unified on shape and semantic criteria.

 

For all other purposes, the transcoding tables between 10646 and various Asian national character encoding standards are best maintained as separate, machine-readable files—not printed in the standard itself. KPS 9566-97 is just one of many other national and vendor Asian character encodings whose values are not printed in the 10646 CJK Unified Ideographs code tables. Its inclusion is not necessary, and would in fact detract from the usefulness of the normative encodings from the source sets which are currently printed there.

2.5. The proposal in JTC1 N5999 suggests the renaming of all the Korean characters in 10646, to replace the term “HANGUL” with the term “KOREAN CHARACTER”. The U.S. firmly opposes the changing of any character names in the standard. The names are a normative part of the standard, referred to in many implementations. Changing them would cause major disruptions. The concern expressed in the proposal about misunderstanding of the term “Hangul” seems misplaced to us. The U.S. sees no indication that implementors are confused about the use of the term “HANGUL” in the normative names as referring to Korean characters. In any case, if there is any misunderstanding about terminology or confusion with competing terms such as “Hanmal”, “Hansik”, or “Hanminjok”, etc., this concern could be addressed by a minor editorial note in 10646 explaining the intent of the usage of “HANGUL”, rather than by a wholesale replacement of normative character names.

 

 

In addition, the US has approved Resolution M37.12 (Feedback to D.P.R of Korea) at the last meeting of JTC1 SC2/WG2 in Copenhagen.  We attach the text of this resolution, which the SC2 secretariat has taken on to communicate to D.P.R.K. 

 

  RESOLUTION M37.12 (Feedback to D.P.R of Korea):       Unanimous

With reference to the NP in document N2056 to amend the Korean encoding of Amendment 5, WG2 instructs its convener to inform S2 to respond to the Committee for Standardization of the D.P.R. of Korea:

 

a) that WG2 cannot support this NP because any reordering of the standardized Korean Hangul characters would harm existing implementations that are using the standard including its Amendment 5

b) that existing standardized character names cannot be changed because character names are normative in the standard and changing them would harm existing users of these standardized character names

c) invite them to make concrete proposals to add any missing characters following the existing WG2 Procedures and Guidelines document (JTC1/SC2/WG2 N2002), and the conventions for naming of characters in the standard, for future consideration by WG2,

d) invite them to participate in the IRG regarding any Hanja character requirements they may have, and

e) draw their attention to FCD-3 of ISO/IEC 14651 --international ordering under ballot in SC22.