L2/07-150


Title: WG2 Consent Docket
Source: Ken Whistler
Date: May 10, 2007

Following my usual procedure, I have rolled up all items from the
latest WG2 meeting (WG2 #50, Frankfurt, Germany, April 23 - 27, 2007)
for which there is a synchronization issue that the UTC needs to
address.

This WG2 meeting progressed 3 amendments:

Amendment 3: The disposition of comments was completed for the FPDAM 3,
             and FDAM 3 will be issued soon.
             
Amendment 4: The disposition of comments was completed for the PDAM 4,
             and FPDAM 4 will be issued imminently.
             
Amendment 5: A new amendment was started, and PDAM 5 has already
             been issued.
             
In the consent docket this time, I will be organizing the issues
in part by which amendment they are associated with, to help
keep things straight. Note that as of current plans, the repertoires
for Amendments 3 and 4 together will be the eventual repertoire
added for Unicode 5.1. The new repertoire for Amendment 5 will
likely be targeted for a future version of Unicode past Version 5.1. 

Note also that for changes specific to FDAM 3, the UTC really
just has to approve things at this point, as there is no
chance now to reconsider the decisions by WG2; the FDAM vote
is an up or down vote, with no technical changes allowed.          

================================================================

A. Latin: Miscellaneous Name Change (FDAM 3)

2C78 LATIN SMALL LETTER E WITH TAIL

WG2 accepted a name change to:

2C78 LATIN SMALL LETTER E WITH NOTCH

This was the character with the controversy over the term
"FINIAL" in the name in the original proposal. UTC accepted
"WITH TAIL" as an alternate. The issue was ad hocced in
WG2, in response to Irish NB comments, and it was discovered
that the feature in common between this letter and others
which will eventually be proposed for the Landsmålsalfabet
is an inward notch diacritic on rounded portions of the
letters. So "WITH NOTCH" sets a better naming precedent for those
eventual characters.

Suggestion: Approve the revised name.

================================================================

B. Vai: New Characters Added, Plus Block Change (FDAM 3)

WG2 accepted two more characters in the Vai block, based on WG2 N3243:

A62A VAI SYLLABLE NDOLE MA
A62B VAI SYLLABLE NDOLE DO

In part because of the possibility of some additional historic
characters, and in part because of other rearrangement of the A6XX
row, WG2 also extended the Vai block from A500..A62F to A500..A63F.

Suggestion: Approve the two new characters and the revised block
range for Vai.

================================================================

C. Mirrored Math Arrows (FDAM 3)

Based on last-minute review of bidirectional mirroring issues
for asymmetric arrow operators with tildes, Asmus Freytag,
Barbara Beeton, and Murray Sargent asked (in WG2 N3259) for the
addition of 6 more arrow operators to complement the
already-approved:

2B41 REVERSE TILDE OPERATOR ABOVE LEFTWARDS ARROW
2B42 LEFTWARDS ARROW ABOVE REVERSE ALMOST EQUAL TO

The 6 new characters were approved by WG2 for FDAM 3. Their names
and code points are:

2B47 REVERSE TILDE OPERATOR ABOVE RIGHTWARDS ARROW
2B48 RIGHTWARDS ARROW ABOVE REVERSE ALMOST EQUAL TO
2B49 TILDE OPERATOR ABOVE LEFTWARDS ARROW
2B4A LEFTWARDS ARROW ABOVE ALMOST EQUAL TO
2B4B LEFTWARDS ARROW ABOVE REVERSE TILDE OPERATOR
2B4C RIGHTWARDS ARROW ABOVE REVERSE TILDE OPERATOR

The rationale is provided in the document. Ordinarily one
would expect more review time for such additions, but the
argument was that with two of these already going into Amd 3,
it was more important to stave off confusion about character
and glyph identity by encoding a complete set now,
rather than waiting to add them piecemeal later, after
problematical choices might have been made in rolling out
new math fonts.

Suggestion: Approve the 6 new math arrow characters.

================================================================

D. Lanna (FPDAM 4)

The UTC has not yet formally approved the encoding for Lanna script. The
UTC has reviewed the proposals a number of times, and requested
the removal of two characters from PDAM 4, as a result of that
review, but was effectively waiting for the results of this
WG2 meeting and the ballot disposition of comments to settle
on a consensus encoding to give final approval to.

Notable changes from PDAM 4 that were the result of the
disposition of comments:

  1. The two vowel signs AM and TALL AM were removed, as
     UTC requested.
     
  2. Two new characters were added:
  
     1A29 LANNA LETTER KHUEN HIGH CHA
     1AAD LANNA SIGN CAANG
     
  3. Various ranges of characters were moved by a few code
     points to accomodate the two removals and two additions.
     
  4. Several character names were updated, most notably
     involving the respelling of "KHUN" as "KHUEN". The
     other changes involved removing a "LOW" or "HIGH" in
     a name where no character of contrasting register occurs.
     
  5. At the request of the Chinese NB, the parenthetical
     alternative name "(Old Tai Lue)" is being added to
     the block name "Lanna" in Annex A.2.2 in 10646.
     
  6. The font style for the representative glyphs was changed
     from a Thai-style font to a Khün-style (Myanmar-like) font,
     again to accomodate the Chinese NB.
     
At this point, all of the issues raised by the U.S., Irish,
U.K., and Chinese NB's seem to have been resolved satisfactorily,
and I think the script should now be considered stable enough
for formal approval.

Suggestion: Approve the encoding of Lanna, as documented
in WG2 N 3264 (Charts for FPDAM 4, = L2/07-131), with 
block name "LANNA" and block range 1A20..1AAF.

================================================================

E. CJK: Disunification of U+4039 (FPDAM 4)

The proposal to disunify the unified CJK ideograph U+4039
was discussed at great length and in excruciating detail at
the WG2 meeting. All of the source reference mapping issues
seemed to have been resolved, based on the assumption that
a disunification was warranted.

The proposal document, WG2 N3196R2 (= L2/07-010) was revised
at the meeting to provide more details required for the
source mappings and other properties for both the original
character and the new character to be disunified from it.
See that document for details and justification.

WG2 decided on U+9FC3 as the code point for the new character.

Suggestion: Approve the disunification, the new character
at U+9FC3, and the source mapping and revised property data for
both U+4039 and U+9FC3.

================================================================

F. Latin: Capital Letter Sharp S (FPDAM 4)

WG2 approved the addition of a capital letter sharp S, based
on document WG2 N3227R (= L2/07-108). See the UTC agenda item
on this topic and the related feedback documents (L2/07-149,
L2/07-156, L2/07-157, ...)

The code point and name approved by WG2 for ballot are:

U+1E9E LATIN CAPITAL LETTER SHARP S

Note that a UTC decision to approve this character at this
point should be considered a formal decision to overturn
a precedent vote. The UTC discussed this topic in November, 2004,
on the basis of an earlier version of the proposal: L2/04-395.
(The proposal at that time was requesting a "Capital Double S",
but the intent was the same as in L2/07-108.) The UTC decided
then:

"101-C22 Consensus: The UTC concurs with Stoetzner that Capital
Double S is a typographical issue. Therefore the UTC believes it is
inappropriate to encode it as a separate character."

"101-A74 Action Item for Ken Whistler. Add Capital Double S to the
reject list."

And the "Capital Double S" (i.e. the capital sharp S) has been
on the list of rejected characters since that time. That fact
should not prejudice the current decision about the character
now under ballot, but I think it does mean that we are dealing
with reversing a precedent, rather than simply approving
a new character not formerly discussed and rejected.

Suggestion: UTC to examine the proposal and feedback documents
and decide to approve or not to approve.

================================================================

G. Combining Macrons for Coptic (FPDAM 4)

WG2 approved 3 combining marks, intended for use in Coptic text
to display macrons across ranges of two or more character.
These were approved on the basis of WG2 N3222 (= L2/07-085),
but with an addition, with name changes and code point
changes.

The code points and names approved by WG2 for ballot are:

U+FE24 COMBINING MACRON LEFT HALF
U+FE25 COMBINING MACRON RIGHT HALF
U+FE26 COMBINING CONJOINING MACRON

Suggestion: Approve the three new characters, encoded in
the Combining Half Marks block.

================================================================

H. Oriya and Malayalam Letters for Vedic (FPDAM 4)

The Vedic proposal WG2 N3235R (= L2/07-095) contained much
that requires further discussion and updates, but amongst
the content there were 4 Oriya and Malayalam dependent
vowels needed to complete the set of Vedic Sanskrit vowels,
as written in those scripts. WG2 decided to add those
4 vowels to Amd 4:

0B44 ORIYA VOWEL SIGN VOCALIC RR
0B62 ORIYA VOWEL SIGN VOCALIC L
0B63 ORIYA VOWEL SIGN VOCALIC LL
0D63 MALAYALAM VOWEL SIGN VOCALIC LL

This follow on, for example, the encoding of similar Malayalam
Vedic vowels in Amd 3.

Suggestion: Approve these four new characters.

================================================================

I. Old Cyrillic (FPDAM 4)

WG2 added all the Old Cyrillic characters approved by the UTC,
but in the process of working up the draft amendment documents,
the contributing editors noted the possibility for a significantly
better arrangement of Old Cyrillic and Cyrillic extensions,
by moving Bamum a couple of columns over and coalescing the
two Cyrillic extension blocks the UTC had previously approved.

Asmus Freytag wrote up the proposed movements and block
rearrangements in WG2 N3213 (= L2/07-105).

What the UTC approved for Cyrillic extensions:

A640..A67F Cyrillic Extended-B block
A8E0..A8FF Cyrillic Extended-C block

What the revised FPDAM 4 reflects as WG2 approved:

A640..A69F Cyrillic Extended-B block

Suggestion: Approve the merging of the two blocks, with the
attendant change in code points for the additional Cyrillic
letters for Abkhaz from A8E0..A8F7 to A680..A697.

================================================================

J. Bamum (PDAM 5)

The situation for Bamum (in PDAM 5) reflects the approval
of the change in code points for Cyrillic extensions (in FPDAM 4).

What the UTC approved:

A680..A6DF Bamum

What the new PDAM 5 reflects as WG2 approved:

A6A0..A6FF Bamum

Except for moving the block over two columns, the characters
and their names are otherwise unchanged.

Suggestion: Approve the revised block and changed code points
for Bamum.

================================================================

K. Coptic Additions (PDAM 5)

In addition to the special combining macrons for Coptic,
WG2 N3222 (= L2/07-085) also requested several other characters
for Coptic. These are four cryptogrammic letters and three
combining marks found in Coptic manuscripts. Since these
particular characters aren't needed with any urgency (as opposed
to the combining macrons, which were needed for imminently
shipping font implementations for generic Coptic use), WG2
accepted these 7 new characters for Amd 5, rather than accelrating
them into Amd 4.

2CEB COPTIC CAPITAL LETTER CRYPTOGRAMMIC SHEI
2CEC COPTIC SMALL LETTER CRYPTOGRAMMIC SHEI
2CED COPTIC CAPITAL LETTER CRYPTOGRAMMIC GANGIA
2CEE COPTIC SMALL LETTER CRYPTOGRAMMIC GANGIA

2CEF COPTIC COMBINING NI ABOVE
2CF0 COPTIC COMBINING SPIRITUS ASPER
2CF1 COPTIC COMBINING SPIRITUS LENIS

(Aside: In a commodious vicus of recirculation, the glyphs
for 2CF0/2CF1, the Coptic derivatives of the Greek rough
and smooth breathing marks, have the glyphs that we *used*
to show for 0485/0486, the Cyrillic derivatives of the Greek
rough and smooth breathing marks, but which have been subsequently
corrected in Unicode 5.0, by and for Cyrillicists, to look
more like the Greek rough and smooth breathing marks. *sigh*)

Suggestion: Approve the 7 characters for Coptic.

================================================================

L. Egyptian Hieroglyphs (PDAM 5)

Finally! 16 years after publication of Unicode 1.0 with
Egyptian hieroglyphs on the cover, a proposal for the
encoding of the basic set of Egyptian hieroglyphs has
advanced to acceptance for ballotting.

WG2 approved 1063 basic Egyptian hieroglyphs, based on
WG2 N3237 (= L2/07-097, superseding earlier L2 documents
on the topic):

13000..13426 Egyptian Hieroglyphs (block: 13000..1342F)

Most of the issues for encoding the Egyptian hieroglyphs have
been ironed out and have consensus among the participants
in drafting the proposal and as reviewed by the community
of professional Egyptologists. The one remaining important
area of controversy has to do with the representation and
encoding of Egyptian numerals. The UTC should review the
issue regarding the numerals, but in my personal opinion,
the stance taken in the proposal (and approved by WG2 for
ballot) is probably the best compromise.

Suggestion: Approve the encoding of the Egyptian hieroglyphs
characters and block, as shown in the PDAM 5 draft,
WG2 N3265 (= L2/07-132).

================================================================

M. Old Hangul Jamo Additions (PDAM 5)

After extended discussion through many consecutive WG2 meetings,
WG2 finally came to a compromise position to deal with the
persistant issue of representation of Old Hangul syllables,
as requested by the ROK delegation to WG2.

The ROK withdrew all requests for model changes to the representation
of Korean, in return for the agreement to encoding of the
additional set of 107 Old Hangul complex jamo letters that
complete the attested set of Old Hangul jamos. (These, in effect,
represent extensions to the already existing set of more common
Old Hangul complex jamo letters, and don't actually change the
model of representation of Korean at all.)

WG2 agreed to the proposed allocation of these 107 jamos as proposed
by Ireland in WG2 N3242 (= L2/07-103). That proposal filled out
the existing 11XX Hangul Jamo block and then made good use of
existing crannies in the BMP in the vicinity of the Hangul
Syllables block for the remainder. The details are:

In the existing 1100..11FF Hangul Jamo block:

115A..115E Old Hangul initial consonants
11A3..11A7 Old Hangul medial vowels
11FA..11FF Old Hangul final consonants

(Those allocations fill the Hangul Jamo block.)

A960..A97F Hangul Jamo Extended-A block:

A960..A97C Old Hangul initial consonants

D7B0..D7FF Hangul Jamo Extended-B block:

D7B0..D7C6 Old Hangul medial vowels
D7CB..D7FB Old Hangul final consonants

Suggestion: Approve the additional 107 Old Hangul jamo characters
and the two new block definitions.

================================================================

N. Tai Viet (PDAM 5)

Removal of one character, change of script name.

The UTC approved the "Tay Viet" script, with corresponding
block and character names, based on L2/07-039.

WG2 saw a revised proposal, with the script renamed to "Tai Viet",
with corresponding block and character names. The revised
proposal (WG2 N3220, = L2/07-099) also removed one character
AAB2 TAI VIET VOWEL AA WITH CIRCUMFLEX, and moved up the following
vowels and tone marks to fill the gap at the position.

Suggestion: Approve the revised script and block name, with
revised code points and names, as shown in L2/07-099.

================================================================

O. Avestan Separation Point (PDAM 5)

The UTC approved the encoding of the Avestan script at the
last meeting.

WG2 approved Avestan, on the basis of the same proposal in
WG2 N3197 (= L2/07-006) for Amd 5, but with one more character
approved than the UTC approved. That character is:

10B38 AVESTAN SEPARATION POINT

This character was discussed at the last UTC meeting.

Suggestion: Discuss again and decide to approve or not to
approve encoding this separation point.

================================================================