L2/05-059 Date: February 3, 2005 Author: Ken Whistler Title: WG2 Consent Docket, Part 2: Unicode 5.0 Issues This document is a continuation of the WG2 consent docket issues. In Part 2, I list all the issues related to the disposition of comments on PDAM2 to ISO/IEC 10646:2003. The situation differs here than for FPDAM1, because there will be another round of technical ballotting on Amendment 2. That means that any items which is problematical for the UTC can still be up for another round of comments and consideration in that ballot. However, my recommendation here is to simply approve all of these changes, as well. 1. Phags-pa The Phags-pa script in Amendment 2 was the subject of an extended ad hoc discussion at the WG2 meeting in Xiamen, involving participants from China, Mongolia, the U.K., Ireland, the U.S., Japan, and the Unicode Consortium. The document which guided that discussion was WG2 N2912, "Open Issues on Phags-pa Encoding", authored by Peter Constable. The report of the ad hoc is WG2 N2922 (=L2/05-036). Essentially *all* issues for Phags-pa encoding were settled definitively by the ad hoc and the result was endorsed by WG2. There were a number of significant changes involved, including a reordering of about half of the characters in the block, but the essential model of the script as approved by the UTC was retained. The resulting table can be seen WG2 N2924, Asmus' "Summary of repertoire for FDAM 1 and FPDAM 2 of ISO/IEC 10646," the summary document for the new amendments. (=L2/05-050) I won't duplicate the entire list here, but the essential changes which the UTC needs to endorse are: a. Encoding of 4 additional characters: PHAGS-PA LETTER ALTERNATE YA PHAGS-PA LETTER VOICELESS SHA PHAGS-PA LETTER VOICED HA PHAGS-PA LETTER ASPIRATED FA These 4 letters are alternate forms of the existing YA, SHA, HA, and FA, respectively, but the ad hoc agreed that these characters, used for spelling Chinese in Phags-pa, were used contrastively in the document they occur in, and that the simplest approach was simply to encode them, rather than to use variation selector sequences for them. b. Corollary to (a): The UTC needs to rescind its approval of 4 variation selector sequences for the 4 characters in question, because they are now encoded as atomic characters, instead. c. Change of one character name: U+A85A PHAGS-PA LETTER -A --> PHAGS-PA LETTER SMALL A This character is derived from the Tibetan character, U+0F60 TIBETAN LETTER -A. However, China objected to "-A" as the name, which made no sense to the Mongolian participants. The Chinese and Mongolian names for the letter in question mean "small a", and simply using "SMALL A" as the name was deemed an acceptable alternative. Doing so also removed the need for another exception to the general naming rules for characters. d. Reordering of the Phags-pa block: The ad hoc and WG2 agreed to a reordering of Phags-pa that essentially moved the non-core letters to the end of the block, rather than interfiling them in Sanskrit order with the other letters. The UTC should simply approve the ordering as specified in WG2 N2924 (=L2/05-050). e. Addition of 6 variation selector sequences: The code points for the Phags-pa letters here reflect the WG2-approved reordered chart, rather than the code points as balloted in PDAM1. The descriptions for these six sequences are, respectively: phags-pa letter reversed shaping small a phags-pa letter reversed shaping ha phags-pa letter reversed shaping i phags-pa letter reversed shaping u phags-pa letter reversed shaping e phags-pa letter reversed shaping subjoined ya The issue here is as follows. Phags-pa has Arabic-style cursive joining between adjacent letters (although vertical in orientation, rather than horizontal). Ordinarily the forms used in joining are predictable from the shape of the preceding letter in context. However, there are six characters which may show up with a left/right reversed shape that is considered the opposite of the normally predicted form. This is not predictable from context, and appears to be a more-or-less freely occurring scribal variant. China and the other Phags-pa experts agreed that the reversed forms need to be representable in the encoding, and the use of variation selector sequences to do so was approved by consensus. The UTC should also approve these 6 sequences. Note that these variation selector sequences do not represent an absolute glyph shape, but rather a reversed form from the *expected* glyph shape. There are some consonant letters in Phags-pa which are formed by reversing the shape of other letters. When those letters are followed by any of these 4 vowels (or ha or ya), the vowel shape is normally also reversed in orientation. The variation selector sequence in those instances would only be used in the anomalous cases where the vowel shape does *not* reverse. So the semantics of these variation selector sequences is not "reversed glyph shape", but rather "reverse shaping (from the normal shaping expected in context)". 2. Kannada additions WG2 approved the addition of 4 Kannada characters: U+0CE2 KANNADA VOWEL SIGN VOCALIC L U+0CE3 KANNADA VOWEL SIGN VOCALIC LL U+0CF1 KANNADA SIGN JIHVAMULIYA U+0CF2 KANNADA SIGN UPADHMANIYA These are documented in WG2 N2860 (=L2/04-364), which the UTC has already reviewed. However, the UTC did not approve any additions based on that document yet. Since the two danda characters in WG2 N2860 were not approved (in accordance with the UTC position that the question of disunification of dandas needs to be considered as a whole, instead of piece by piece), I believe the UTC should approve the addition of the remaining 4 characters from that document. 3. Balinese WG2 approved the addition of the Balinese script for ballotting in FPDAM2, based on WG2 N2908 (=L2/05-008). The UTC should take this up under a separate agenda item, since we need to review that document, and there is a separate contribution detailing the various equivalency issues which need to be dealt with first. (See Peter Constable's "Comments on Balinese Proposal, L2/05-008" -- L2/05-056.) The short summary to note here, however, is that approval of the Balinese script would imply the addition of 121 characters for Unicode 5.0, and a new block for Balinese: U+1B00..U+1B7F. The UTC and L2 instructed their delegates to oppose the addition of Balinese to FPDAM2 based on the earlier proposal, "Preliminary proposal for Balinese," WG2 N2856 (=L2/04-357), which they deemed not yet mature for encoding. (Note that the author of that proposal, Michael Everson, concurred with that assessment.) However, the proposal in WG2 N2908 was the result of detailed technical work with Balinese experts in Bali, just prior to the WG2 meeting, and was a much more complete, mature, and well-documented proposal. It also clearly had strong community support from the Balinese. On that basis, the U.S. and UTC participants in WG2 joined the consensus to add Balinese to FPDAM2. And I believe that the UTC should now formally approve the proposal, after reviewing the various technical issues relevant specifically to the Unicode Standard, including specification of canonical equivalences. All other changes to Amendment 2 approved by WG2 were either character additions already approved by the UTC for the Unicode Standard or textual changes relevant only to the text of 10646 itself.