L2/03-014

From: Ken Whistler
Date: 2003-01-16 18:43:44 -0800
Subject: WG2 Tokyo Results

Unicadetti,

As you no doubt know, WG2 and SC2 met last month in Tokyo,
December 9-12, 2002. Usually, I send around a report on the
meeting fairly quickly, but this time holidays intervened,
and, well... here it is the middle of January, gosh!

At any rate, here is a short report of the highlights
of the meeting, as they pertain to UTC business.

--Ken

===========================================================

The meeting was *extremely* well-attended, with one of the
higher attendances recorded for a WG2 meeting. Attending
national bodies and observer/guests included:

  Canada (1)
  China (7)
  DPRK (2)
  Finland (1)
  Poland (1)
  Lithuania (1)
  Iran (2)
  Ireland (1)
  Japan (9) -- including SC2 chair and secretary
  ROK (2)
  Singapore (1)
  USA (5)
  
The resolutions can be found online as WG2 N2554 (for the
WG2 resolutions) and SC2 N3668 (for the SC2 resolutions).
I'll work through the significant decisions by topic.

1. 10646-1 Amendment 2 (resolutions M43.1, M43.4, M43.5)

The comments on the FPDAM ballot were all resolved, and this
amendment was progressed for a (short) FDAM ballot. This
closes out the technical changes for this amendment, and serves
to define the repertoire of additions for the BMP for Unicode
4.0. The accomodations were sufficient to turn the negative
votes from Iran, Ireland, and U.S. to yes. Only the ROK
vote stayed no.

>From the UTC point of view, the good news is that all of the
U.S. ballot comments were accepted, more or less. This
included requests for a few Greek characters, U+23D0
VERTICAL LINE EXTENSION, a character name change for ARABIC MARK
NOON GHUNNA, and miscellaneous other small fixes. Most
important, the proposed collection of DPRK compatibility
CJK characters was *removed* from the amendment for further
review and study (see below).

There were a few other changes to the amendment, in response
to other NB comments. In particular, a few of the Arabic
character additions were repositioned in the charts, and
there was another character name change. I'll send around
a separate document with the details, as a "WG2 Consent
Docket" document, so the UTC can formally go on record to
synch things back up for its character approvals at the
March meeting.

2. 10646-2 Amendment 1 (resolution M43.7)

This amendment was much easier. There were only a few
issues noted, in the Ireland and U.S. comments -- all of
which were easily fixed. This resulted in unanimous
approval, and the amendment was progressed for its FDAM
ballot. In turn, this defines the repertoire of additions
for the supplementary planes for Unicode 4.0.

3. Republication of 10646 as a single standard (resolution M43.11)

Because of the good work that Michel did in preparing for this,
the republication of the two parts of 10646 as a single
standard is proceeding apace. The editor got a lot of
specific feedback on the draft document. Now the
committee is on record, with SC2 confirming, to republish
10646-1:2000 and 10646-2:2001 and all the amendments and
corrigenda to date as 10646:2003 -- a single standard.
Because of this, 10646-1 Amd 2 and 10646-2 Amd 1 will likely
not actually be published by ITTF; they will simply be
incorporated directly in 10646:2003. (We are hoping that
the publication can be accomplished this year, to synchronize
it with Unicode 4.0.)

4. Character Name Rule for Loose Matching (resolution M43.2)

This was the result of a UTC request, presented as a U.S.
position paper, to add a simple constraint on 10646
character names (in Annex L) to make it easier to ensure
that loose matching on names will continue to work into
the future. This proved to be *waaaay* more problematical
than it should have been. Unfortunately, the UTC position
document on this was formulated in terms of an algorithm
which would use the constraint, rather than as a textual
edit to 10646. The net result was mass confusion in the
WG2 committee discussion. And yours truly and Asmus (the
UTC liaison to WG2) had to do a bunch of FUD management and
hand-holding to get people to understand the request and
to make it clear that nobody was trying to put an *algorithm*
into 10646, nor were the constraints going to cripple future
character naming options. As it was, the resolution
elicited an abstention from Japan and a no vote from ROK.

In the end, the UTC position prevailed. You can see the
text added to Annex L in the resolution in WG2 N2554. But
something to think about for the future is that any such
"great ideas" coming from the UTC for improving 10646
need to be carefully constructed as editorial changes to
the 10646 text (with simple and clear justifications),
since it is basically just impossible to discuss
algorithms productively in the context of a WG2 plenary 
meeting.

5. Korean Compatibility Characters (resolution M43.3)

Korean issues once again seemed to dominate the
agenda for WG2. The proposed repertoire of 122 compatibility
ideographs that the UTC objected to, since our experts had
turned up so many problems with the set -- and particularly
with the mappings -- was discussed at tedious and
excruciating length. Effectively for much of the WG2 meeting
there was an ad hoc meeting running on the side where each of
the characters was examined one-by-one again.

The ad hoc came up with a report on the last day which
did, in fact, confirm many of the issues raised by the
U.S. expert reviewers, although a couple of the reported
issues were themselves identified as erroneous. The
problems were grouped into various categories. Importantly,
the ad hoc did come to agreement on the principle that
adding a compatibility character for a character in the
DPRK standard which could/should instead be mapped to
a Plane 2 character would be a bad idea, and that resulted
in removing a number of characters from the requested set
of additions.

WG2 crafted a specific resolution which removed the set
of 122 from the FPDAM, and instead requested DPRK and
other interested parties to review the refined replacement
set of 107 which resulted from the ad hoc's analysis.
Ironically, DPRK ended up voting *for* this resolution
which removed the set of 122 from the amendment, whereas
Japan voted *against* it. Go figure. China voted emphatically
for the resolution, and in fact wanted to go on record to
make a statement (presumably condemning all additions of
compatibility CJK ideographs), but in the end settled for
just voting "YES!" on the resolution.

The UTC experts on CJK mapping should take another look
at the now smaller set of 107, but we now have some
breathing room for verifying that the set is exactly
accurate before it is encoded -- since it will have to
be presented again for a future amendment. This narrowly
averted the prospect of more normalization fixes in the
future, with attendant thundering disapprobation from 
the IETF. :-) And the DPRK delegation went away reasonably
happy, apparently -- presumably because everyone had
paid so much technical attention to their character set
request, and because the resolution was stated in terms
of a provisional acceptance of the revised set of 107
for the future (pending further review).

6. Other Korean Issues

In keeping with the ability to Korean to throw a spanner
in the works for any 10646 discussion, there were other
Korean issues which kept the committee busy. In particular,
there was a rather painful discussion about what to do
with Annexes Q and R in 10646. 

Annex Q is the "Code Mapping
Table for Hangul Syllables", which does the mapping between
the pre-Amendment 5 Hangul (in 10646-1:1993) and the
post-Amendment 5 Hangul (in Unicode 2.0 and subsequent,
including 10646-1:2000). It is a big formatted mapping table,
which is unusable as is, because it is not machine readable.

Annex R is the "Names of Hangul syllables", another big
formatted table which lists out all the algorithmically
derivable names for the 11,172 Hangul syllables.

The ROK balked at Michel's proposal to replace these
annexes by very short descriptive text and separate
machine-readable tables with the information in them.
After a long discussion in which it was repeatedly pointed
out that the entire standard is only published on CD-ROM
now anyway, and that anyone could take the machine-readable
data files and print them out however they saw fit,
and after various expressions of disgust, confusion,
adamancy, and exasperated amusement, it was finally decided
to keep the current tables *and* to add the machine-readable
tables.

And for yet *more* Korean issues, the request for adding
DPRK Hanja character source references as normative source
references for the Unified Han characters was sent back
to the DPRK and the IRG for more cooking.

7. Tibetan BrdaRten proposal (resolution M43.9)

This is the proposal for 956 precomposed Tibetan stacks,
which has seen a lot of discussion on the Unicode lists.
Essentially, WG2 took no action, other than to invite other
NB's to review and provide feedback on the proposal, and
for China to come up with a revised proposal based on the
feedback. I think it would be a good idea for the UTC to
provide such feedback, based on all the discussion which
we've already had on the topic, to ensure that there is
a strong counter-position on the table going into the
next WG2 meeting in October.

8. Hong Kong character requests (resolution M43.8)

This was the late proposal which came in (see WG2 N2513)
asking for some more precomposed Latin characters for
pinyin and for two poorly defined electrical circuit
symbols. The precomposed characters were rejected, and
HKSAR was invited to provide better information about
the identity and usage of the two symbols.

9. Other Item of Interest: 8859-7

This was not a WG2 item of business, but rather SC2.
The last open project for WG3 was the long-delayed finishing
of revision of the Greek part of 8859. This has basically
been sitting unresolved for a couple of years. To resolve
this, SC2 summarily transferred the editorship to Michael
Everson, who agreed to have the whole thing finished in
a couple weeks. At this point, WG3 should have no further
work, and SC2 should be out of the business of developing
and promulgating more 8-bit character standards. (Yay!)

Note: spontaneous expostulations, such as that noted
at the end of the previous paragraph, are not necessarily
the opinion of this reporter's employer, nor of WG2 or
WG3 or any other committee, living or dead. ;-)