L2/03-014 From: Ken Whistler Date: 2003-01-16 18:43:44 -0800 Subject: WG2 Tokyo Results Unicadetti, As you no doubt know, WG2 and SC2 met last month in Tokyo, December 9-12, 2002. Usually, I send around a report on the meeting fairly quickly, but this time holidays intervened, and, well... here it is the middle of January, gosh! At any rate, here is a short report of the highlights of the meeting, as they pertain to UTC business. --Ken =========================================================== The meeting was *extremely* well-attended, with one of the higher attendances recorded for a WG2 meeting. Attending national bodies and observer/guests included: Canada (1) China (7) DPRK (2) Finland (1) Poland (1) Lithuania (1) Iran (2) Ireland (1) Japan (9) -- including SC2 chair and secretary ROK (2) Singapore (1) USA (5) The resolutions can be found online as WG2 N2554 (for the WG2 resolutions) and SC2 N3668 (for the SC2 resolutions). I'll work through the significant decisions by topic. 1. 10646-1 Amendment 2 (resolutions M43.1, M43.4, M43.5) The comments on the FPDAM ballot were all resolved, and this amendment was progressed for a (short) FDAM ballot. This closes out the technical changes for this amendment, and serves to define the repertoire of additions for the BMP for Unicode 4.0. The accomodations were sufficient to turn the negative votes from Iran, Ireland, and U.S. to yes. Only the ROK vote stayed no. >From the UTC point of view, the good news is that all of the U.S. ballot comments were accepted, more or less. This included requests for a few Greek characters, U+23D0 VERTICAL LINE EXTENSION, a character name change for ARABIC MARK NOON GHUNNA, and miscellaneous other small fixes. Most important, the proposed collection of DPRK compatibility CJK characters was *removed* from the amendment for further review and study (see below). There were a few other changes to the amendment, in response to other NB comments. In particular, a few of the Arabic character additions were repositioned in the charts, and there was another character name change. I'll send around a separate document with the details, as a "WG2 Consent Docket" document, so the UTC can formally go on record to synch things back up for its character approvals at the March meeting. 2. 10646-2 Amendment 1 (resolution M43.7) This amendment was much easier. There were only a few issues noted, in the Ireland and U.S. comments -- all of which were easily fixed. This resulted in unanimous approval, and the amendment was progressed for its FDAM ballot. In turn, this defines the repertoire of additions for the supplementary planes for Unicode 4.0. 3. Republication of 10646 as a single standard (resolution M43.11) Because of the good work that Michel did in preparing for this, the republication of the two parts of 10646 as a single standard is proceeding apace. The editor got a lot of specific feedback on the draft document. Now the committee is on record, with SC2 confirming, to republish 10646-1:2000 and 10646-2:2001 and all the amendments and corrigenda to date as 10646:2003 -- a single standard. Because of this, 10646-1 Amd 2 and 10646-2 Amd 1 will likely not actually be published by ITTF; they will simply be incorporated directly in 10646:2003. (We are hoping that the publication can be accomplished this year, to synchronize it with Unicode 4.0.) 4. Character Name Rule for Loose Matching (resolution M43.2) This was the result of a UTC request, presented as a U.S. position paper, to add a simple constraint on 10646 character names (in Annex L) to make it easier to ensure that loose matching on names will continue to work into the future. This proved to be *waaaay* more problematical than it should have been. Unfortunately, the UTC position document on this was formulated in terms of an algorithm which would use the constraint, rather than as a textual edit to 10646. The net result was mass confusion in the WG2 committee discussion. And yours truly and Asmus (the UTC liaison to WG2) had to do a bunch of FUD management and hand-holding to get people to understand the request and to make it clear that nobody was trying to put an *algorithm* into 10646, nor were the constraints going to cripple future character naming options. As it was, the resolution elicited an abstention from Japan and a no vote from ROK. In the end, the UTC position prevailed. You can see the text added to Annex L in the resolution in WG2 N2554. But something to think about for the future is that any such "great ideas" coming from the UTC for improving 10646 need to be carefully constructed as editorial changes to the 10646 text (with simple and clear justifications), since it is basically just impossible to discuss algorithms productively in the context of a WG2 plenary meeting. 5. Korean Compatibility Characters (resolution M43.3) Korean issues once again seemed to dominate the agenda for WG2. The proposed repertoire of 122 compatibility ideographs that the UTC objected to, since our experts had turned up so many problems with the set -- and particularly with the mappings -- was discussed at tedious and excruciating length. Effectively for much of the WG2 meeting there was an ad hoc meeting running on the side where each of the characters was examined one-by-one again. The ad hoc came up with a report on the last day which did, in fact, confirm many of the issues raised by the U.S. expert reviewers, although a couple of the reported issues were themselves identified as erroneous. The problems were grouped into various categories. Importantly, the ad hoc did come to agreement on the principle that adding a compatibility character for a character in the DPRK standard which could/should instead be mapped to a Plane 2 character would be a bad idea, and that resulted in removing a number of characters from the requested set of additions. WG2 crafted a specific resolution which removed the set of 122 from the FPDAM, and instead requested DPRK and other interested parties to review the refined replacement set of 107 which resulted from the ad hoc's analysis. Ironically, DPRK ended up voting *for* this resolution which removed the set of 122 from the amendment, whereas Japan voted *against* it. Go figure. China voted emphatically for the resolution, and in fact wanted to go on record to make a statement (presumably condemning all additions of compatibility CJK ideographs), but in the end settled for just voting "YES!" on the resolution. The UTC experts on CJK mapping should take another look at the now smaller set of 107, but we now have some breathing room for verifying that the set is exactly accurate before it is encoded -- since it will have to be presented again for a future amendment. This narrowly averted the prospect of more normalization fixes in the future, with attendant thundering disapprobation from the IETF. :-) And the DPRK delegation went away reasonably happy, apparently -- presumably because everyone had paid so much technical attention to their character set request, and because the resolution was stated in terms of a provisional acceptance of the revised set of 107 for the future (pending further review). 6. Other Korean Issues In keeping with the ability to Korean to throw a spanner in the works for any 10646 discussion, there were other Korean issues which kept the committee busy. In particular, there was a rather painful discussion about what to do with Annexes Q and R in 10646. Annex Q is the "Code Mapping Table for Hangul Syllables", which does the mapping between the pre-Amendment 5 Hangul (in 10646-1:1993) and the post-Amendment 5 Hangul (in Unicode 2.0 and subsequent, including 10646-1:2000). It is a big formatted mapping table, which is unusable as is, because it is not machine readable. Annex R is the "Names of Hangul syllables", another big formatted table which lists out all the algorithmically derivable names for the 11,172 Hangul syllables. The ROK balked at Michel's proposal to replace these annexes by very short descriptive text and separate machine-readable tables with the information in them. After a long discussion in which it was repeatedly pointed out that the entire standard is only published on CD-ROM now anyway, and that anyone could take the machine-readable data files and print them out however they saw fit, and after various expressions of disgust, confusion, adamancy, and exasperated amusement, it was finally decided to keep the current tables *and* to add the machine-readable tables. And for yet *more* Korean issues, the request for adding DPRK Hanja character source references as normative source references for the Unified Han characters was sent back to the DPRK and the IRG for more cooking. 7. Tibetan BrdaRten proposal (resolution M43.9) This is the proposal for 956 precomposed Tibetan stacks, which has seen a lot of discussion on the Unicode lists. Essentially, WG2 took no action, other than to invite other NB's to review and provide feedback on the proposal, and for China to come up with a revised proposal based on the feedback. I think it would be a good idea for the UTC to provide such feedback, based on all the discussion which we've already had on the topic, to ensure that there is a strong counter-position on the table going into the next WG2 meeting in October. 8. Hong Kong character requests (resolution M43.8) This was the late proposal which came in (see WG2 N2513) asking for some more precomposed Latin characters for pinyin and for two poorly defined electrical circuit symbols. The precomposed characters were rejected, and HKSAR was invited to provide better information about the identity and usage of the two symbols. 9. Other Item of Interest: 8859-7 This was not a WG2 item of business, but rather SC2. The last open project for WG3 was the long-delayed finishing of revision of the Greek part of 8859. This has basically been sitting unresolved for a couple of years. To resolve this, SC2 summarily transferred the editorship to Michael Everson, who agreed to have the whole thing finished in a couple weeks. At this point, WG3 should have no further work, and SC2 should be out of the business of developing and promulgating more 8-bit character standards. (Yay!) Note: spontaneous expostulations, such as that noted at the end of the previous paragraph, are not necessarily the opinion of this reporter's employer, nor of WG2 or WG3 or any other committee, living or dead. ;-)