L2/05-059

Date: February 3, 2005

Author: Ken Whistler

Title: WG2 Consent Docket, Part 2: Unicode 5.0 Issues


This document is a continuation of the WG2 consent docket
issues. In Part 2, I list all the issues related to the
disposition of comments on PDAM2 to ISO/IEC 10646:2003.

The situation differs here than for FPDAM1, because there
will be another round of technical ballotting on Amendment 2. 
That means that any items which is problematical for the
UTC can still be up for another round of comments and
consideration in that ballot.

However, my recommendation here is to simply approve all
of these changes, as well.


1. Phags-pa

The Phags-pa script in Amendment 2 was the subject of an
extended ad hoc discussion at the WG2 meeting in Xiamen,
involving participants from China, Mongolia, the U.K.,
Ireland, the U.S., Japan, and the Unicode Consortium.

The document which guided that discussion was WG2 N2912,
"Open Issues on Phags-pa Encoding", authored by Peter
Constable. The report of the ad hoc is WG2 N2922 (=L2/05-036).

Essentially *all* issues for Phags-pa encoding were settled
definitively by the ad hoc and the result was endorsed by
WG2. There were a number of significant changes involved,
including a reordering of about half of the characters in
the block, but the essential model of the script as approved
by the UTC was retained.

The resulting table can be seen WG2 N2924, Asmus' "Summary
of repertoire for FDAM 1 and FPDAM 2 of ISO/IEC 10646," the
summary document for the new amendments. (=L2/05-050)

I won't duplicate the entire list here, but the essential
changes which the UTC needs to endorse are:

a. Encoding of 4 additional characters:

   PHAGS-PA LETTER ALTERNATE YA
   PHAGS-PA LETTER VOICELESS SHA
   PHAGS-PA LETTER VOICED HA
   PHAGS-PA LETTER ASPIRATED FA
   
   These 4 letters are alternate forms of the existing YA, SHA,
   HA, and FA, respectively, but the ad hoc agreed that these
   characters, used for spelling Chinese in Phags-pa, were used
   contrastively in the document they occur in, and that the
   simplest approach was simply to encode them, rather than
   to use variation selector sequences for them.
   
b. Corollary to (a):

   The UTC needs to rescind its approval of 4 variation selector
   sequences for the 4 characters in question, because they
   are now encoded as atomic characters, instead.
   
c. Change of one character name:

   U+A85A PHAGS-PA LETTER -A --> PHAGS-PA LETTER SMALL A
   
   This character is derived from the Tibetan character,
   U+0F60 TIBETAN LETTER -A.
   
   However, China objected to "-A" as the name, which made no
   sense to the Mongolian participants. The Chinese and
   Mongolian names for the letter in question mean "small a",
   and simply using "SMALL A" as the name was deemed an
   acceptable alternative. Doing so also removed the need
   for another exception to the general naming rules for
   characters.
   
d. Reordering of the Phags-pa block:

   The ad hoc and WG2 agreed to a reordering of Phags-pa that
   essentially moved the non-core letters to the end of the
   block, rather than interfiling them in Sanskrit order with
   the other letters.
   
   The UTC should simply approve the ordering as specified in
   WG2 N2924 (=L2/05-050).
   
e. Addition of 6 variation selector sequences:

   <A856 PHAGS-PA LETTER SMALL A, FE00>
   <A85C PHAGS-PA LETTER HA, FE00>
   <A85E PHAGS-PA LETTER I, FE00>
   <A85F PHAGS-PA LETTER U, FE00>
   <A860 PHAGS-PA LETTER E, FE00>
   <A868 PHAGS-PA LETTER SUBJOINED YA, FE00>
   
   The code points for the Phags-pa letters here reflect the
   WG2-approved reordered chart, rather than the code points
   as balloted in PDAM1.
   
   The descriptions for these six sequences are, respectively:
   
   phags-pa letter reversed shaping small a
   phags-pa letter reversed shaping ha
   phags-pa letter reversed shaping i
   phags-pa letter reversed shaping u
   phags-pa letter reversed shaping e
   phags-pa letter reversed shaping subjoined ya
   
   The issue here is as follows. Phags-pa has Arabic-style
   cursive joining between adjacent letters (although vertical
   in orientation, rather than horizontal). Ordinarily the
   forms used in joining are predictable from the shape of
   the preceding letter in context.
   
   However, there are six characters which may show up with
   a left/right reversed shape that is considered the opposite
   of the normally predicted form. This is not predictable
   from context, and appears to be a more-or-less freely
   occurring scribal variant.
   
   China and the other Phags-pa experts agreed that the reversed
   forms need to be representable in the encoding, and the use
   of variation selector sequences to do so was approved by
   consensus. The UTC should also approve these 6 sequences.
   
   Note that these variation selector sequences do not represent
   an absolute glyph shape, but rather a reversed form from the
   *expected* glyph shape. There are some consonant letters in
   Phags-pa which are formed by reversing the shape of other
   letters. When those letters are followed by any of these
   4 vowels (or ha or ya), the vowel shape is normally also
   reversed in orientation. The variation selector sequence in
   those instances would only be used in the anomalous cases
   where the vowel shape does *not* reverse. So the semantics
   of these variation selector sequences is not "reversed glyph
   shape", but rather "reverse shaping (from the normal shaping
   expected in context)".
   
   
2. Kannada additions

   WG2 approved the addition of 4 Kannada characters:
   
   U+0CE2 KANNADA VOWEL SIGN VOCALIC L
   U+0CE3 KANNADA VOWEL SIGN VOCALIC LL
   U+0CF1 KANNADA SIGN JIHVAMULIYA
   U+0CF2 KANNADA SIGN UPADHMANIYA
   
   These are documented in WG2 N2860 (=L2/04-364), which the UTC
   has already reviewed. However, the UTC did not approve any
   additions based on that document yet. Since the two danda
   characters in WG2 N2860 were not approved (in accordance with
   the UTC position that the question of disunification of dandas
   needs to be considered as a whole, instead of piece by piece),
   I believe the UTC should approve the addition of the remaining
   4 characters from that document.
   

3. Balinese

   WG2 approved the addition of the Balinese script for ballotting
   in FPDAM2, based on WG2 N2908 (=L2/05-008).
   
   The UTC should take this up under a separate agenda item, since
   we need to review that document, and there is a separate
   contribution detailing the various equivalency issues which
   need to be dealt with first. (See Peter Constable's "Comments
   on Balinese Proposal, L2/05-008" -- L2/05-056.)
   
   The short summary to note here, however, is that approval of
   the Balinese script would imply the addition of 121 characters
   for Unicode 5.0, and a new block for Balinese: U+1B00..U+1B7F.
   
   The UTC and L2 instructed their delegates to oppose the
   addition of Balinese to FPDAM2 based on the earlier proposal,
   "Preliminary proposal for Balinese," WG2 N2856 (=L2/04-357),
   which they deemed not yet mature for encoding. (Note that the
   author of that proposal, Michael Everson, concurred with that
   assessment.) However, the proposal in WG2 N2908 was the
   result of detailed technical work with Balinese experts in
   Bali, just prior to the WG2 meeting, and was a much more
   complete, mature, and well-documented proposal. It also clearly
   had strong community support from the Balinese. On that basis,
   the U.S. and UTC participants in WG2 joined the consensus
   to add Balinese to FPDAM2. And I believe that the UTC should
   now formally approve the proposal, after reviewing the various
   technical issues relevant specifically to the Unicode Standard,
   including specification of canonical equivalences.
   
   
All other changes to Amendment 2 approved by WG2 were either
character additions already approved by the UTC for the Unicode
Standard or textual changes relevant only to the text of
10646 itself.