L2/21-070

Editorial Committee Report and Recommendations for UTC #167 Meeting

Source: Editorial Commitee

Date: April 26, 2021

A. Unicode Release Topics

A1. Unicode 14.0 Schedule and Planning

FYI: The significant milestones for the Unicode 14.0 release are:

The planned beta review and release dates are unchanged from those reported in the Editorial Committee Report and Recommendations for UTC #166 Meeting. The alpha review is now complete, with the close date moved to April 12, to match the close date for other PRIs for discussion at UTC #167.

Once the UTC has made all its decisions based on the alpha review feedback, the Editorial Committee plans to start coordination of the beta review preparation.


A2. Alpha Review for 14.0

FYI: Alpha Review for 14.0 closed on April 12. The review produced a significant amount of feedback, as noted below and in the reports from the other groups that considered the feedback. The good news is that we got a lot of feedback. The bad news is that we got a lot of feedback.

The upside is that quite a few small errors were noted, many of which have already been addressed in the data files or in other documents and drafts. This means that the quality of the beta review should be better than it otherwise might have been, with fewer errors to note and fix, particularly in the data files.

The problem, however, is that despite our attempts to focus folks on review of the actual repertoire proposed for encoding, and the names and glyphs in the alpha charts, many of the reviewers proceeded to treat the alpha review as if it were the beta review, and raised many questions about the details of the data files, many of which were not fully ready for prime time when the alpha review started. This resulted in a significant additional chunk of early work for the folks responsible for preparing data files for the UCD, as more iterations and adjustments have been necessary, prior to beta review, when that work used to be started.

The voluminous feedback has also resulted in more churning of names list annotations than is usual for a release.

Altogether, the alpha review turned into way more work than was initially advertised, and if we are going to continue doing both an alpha review and a beta review cycle for all future releases, then effectively the alpha review has to be planned (and staffed) in more detail, unless the UTC were willing to bin off-topic feedback during the alpha review period. But in our opinion, trying to train public reviewers to limit their feedback to certain topics is hopeless.

EC-UTC167-R1: The Editorial Committee recommends that
The UTC closes PRI #428, Unicode 14.0.0 Alpha Review.

Suggested associated action items:

AI Rick McGowan. Close PRI #428.

AI Ken Whistler. Update status notice on the Pipeline page.


A3. Beta Review for 14.0

FYI: We now proceed to beta review for Unicode 14.0. Presuming that all the pertinent technical decisions regarding approved repertoire have been recorded, including any name and code point changes, the UTC should go ahead and authorize the start of the 14.0 beta review, according to the schedule noted above.

EC-UTC167-R2: The Editorial Committee recommends that
The UTC authorizes a PRI for a Beta review period for the Unicode 14.0 repertoire, to start June 4, 2021. To close July 20, 2021.

Suggested associated action items:

AI Rick McGowan. Post a PRI for the Unicode 14.0 beta review, to close July 20, 2021.

AI Ken Whistler, Editorial Committee. Execute the beta review plan for 14.0.

Note that rather than recording a bunch of individual action items regarding the beta review, as we had to do for the alpha review last January, this can be kept to a single tracking action, as there already exists a detailed plan for beta review (the "Big Red Switch"), used by the Editorial Committee to handle this.


A4. Unicode 14.0 Core Specification and Other Editing

FYI: Meeting virtually like most everyone else, the editorial committee is continuing its work on Version 14.0 of the Unicode Standard, due for release in September. We'll be finalizing the text of the core specification in June and July. We are also continuing our work editing technical reports. We'll be updating the summary information on the Version 14.0 web page in preparation for the beta.


B. Website Topics

B1. Website Status

FYI: The technical website has been stable since the last UTC meeting, with no access problems and few reports of issues with content on particular pages. The Editorial Committee has participated in minor maintenance on a few pages, including update of some FAQ pages and ongoing minor re-templating of some pages. In particular, there was significant work done on the pages listing Unicode Consortium officers, to reflect changes in personnel and organizational structure coming out of the latest BOD meetings. See:

Unicode Executive Officers

Chairs for Unicode Technical Committees and Subcommittees

Unicode Consortium Org Chart

B2. Website Content Maintenance

FYI: The Editorial Committee plans to work on a complete analysis of the technical website content, so that content ownership can be rationalized and a more systematic approach to ongoing maintenance can eventually be developed. There is nothing significant to report on this project right now. The Editorial Committee focus has been on ongoing work related to the 14.0 release, and there have been no cycles available for key participants to delve into the website maintenance planning.


C. Editorial Committee Process Issues

FYI: The Editorial Committee continues to meet approximately once a month via Zoom, with those monthly meetings now scheduled for 5 hours (with a lunch break), instead of the longer meetings we used to hold. A certain level of Zoom fatigue has set in among everyone, and the efficiency of the meetings has been dropping a bit, as all involved tend to be swamped with more and more virtual meetings. Given the growth of the organization, there is little likelihood that the number of meetings will decrease in the future, even after COVID-19 restrictions are relaxed again.

Part of the immediate problem for the Editorial Committee is that a number of the veteran editors on the committee are also increasingly involved in other aspects of the organization, including PR, governance, and infrastructure issues. This has significantly diluted the attention to the core editorial issues attended to by the Editorial Committee.


D. UTR Topics

FYI: The Editorial Committee has no new suggestions to bring up separately about the content of various UTRs at this time. Feedback on documents open for public review is covered below.


E. PRI Topics

E1. Overall Disposition of Open PRIs

To ensure that the UTC records explicit actions for all of the currently open PRIs, we have pulled together an omnibus recommendation for progression of each PRI, except PRI #428 (alpha review -- see above) and PRI #408 (QID).

EC-UTC167-R3: The Editorial Committee recommends that
The UTC extends the close dates for the following open PRIs to July 20, 2021:

Note that UTS #18 is not part of the Unicode 14.0 release, but it is separately recommended to extend the close date of the PRI for its proposed update. See recommendation PRI427b in L2/21-069. UTR #23 and UTR #53 are also not part of the Unicode 14.0 release, but there is no urgency to close the PRI for UTR #23 and publish that specification now. For UTR #53, because of the nature of the change in the document, that specification must wait until the release of Unicode 14.0 for publication, so its PRI should also just be extended now.

Suggested associated action item:

AI Rick McGowan. Extend the close dates for PRIs #416, #417, #419, #420, #421, #422, #424; #423, #425, #427; #415, #426. To close July 20, 2021.


E2. Editorial Feedback on PRI #417 for UAX #29

FYI: The following items are extracted from the feedback to PRI #417 for discussion and disposition by the Editorial Committee. The Editorial Committee considers the other feedback received on PRI #417 to cover technical issues that should be dealt with by the Properties & Algorithms Group, rather than the Editorial Committee.


Date/Time: Mon Mar 22 18:43:49 CDT 2021
Name: Masahiro Sekiguchi
Report Type: Error Report
Opt Subject: A small editorial issue on UAX #29

On the Comments column on the row second from the bottom (for "kʷ") in 
Table 1a, The annex says "sequence with letter modifier", though I believe 
the Unicode Standard uses a term "modifier letter" but "letter modifier" 
to describe a character like "ʷ".  It should be changed to read "sequence 
with modifier letter" for less confusion.

Discussion: The Editorial Committee considered this to be a good change. Chris Chapman has already implemented the change in the 4/15/2021 draft of the proposed update for UAX #29, so no action item needs to be recorded.


Date/Time: Sun Mar 28 06:31:11 CDT 2021
Name: Masahiro Sekiguchi
Report Type: Error Report
Opt Subject: UAX #29 contains a strange statement as an explanation

The second line in Section 4 (Word Boundaries) currently reads:

The most familiar ones are selection (double-click mouse selection or “move
to next word” control-arrow keys) and the dialog option “Whole Word Search”
for search and replace.

It implies that '"move to next word" control-arrow keys' is a "selection",
but I believe it is contrary to the common function; control-arrow key
usually instructs a movement of the cursor without selection, and if you
want to select to next word, you need to press control-shift-arrow keys.

Probably we should either change '"move to next word" control-arrow keys' to
'"select to next word" control-shift-arrow keys" or change the nearby
phrases to something like '... selection (double-click mouse selection),
cursor movement ("move to next word" control-arrow keys), and the dialog
...'

I hope this feedback helps.

Discussion: The Editorial Committee considered this to be a good recommendation. Chris Chapman has already implemented the change in the 4/15/2021 draft of the proposed update for UAX #29, so no action item needs to be recorded. The text improvement suggested by the Editorial Committee was:

The most familiar ones are selection (double-click mouse selection), cursor movement (“move to next word” control-arrow keys), and the dialog option “Whole Word Search” for search and replace.


Date/Time: Sun Apr 11 18:04:14 CDT 2021
Name: Masahiro Sekiguchi
Report Type: Error Report
Opt Subject: Inappropriate description in UAX #29

The 3rd paragraph of "7 Testing" in UAX #29 "Unicode Text Segmentation"
explains the format of the three auxiliary files (referred to as
[Charts29]), and I believe the current description is different from the
actual auxiliary files.  It says "The header cells of the chart consist of a
property value, followed by a representative code point number.", but no
"representative code point number" follows the property name on the actual
chart.  It also says " hovering the mouse over the code point number will
show the character name, General_Category, Line_Break, and Script property
values.", but the character name etc. are shown when hovering over property
values but code point numbers (perhaps because there are no code point
numbers).  Either the description of the charts in UAX #29 or the charts
themselves should be corrected to make them consistent.

Discussion: The Editorial Committee agreed that this description no longer accurately reflects the actual format of the charts. Chris Chapman has updated the paragraph in the 4/15/2021 draft of the proposed update for UAX #29 to properly reflect what is shown in the chart header and first column row, and what is shown in tooltips, so no action item needs to be recorded.


E3. Editorial Feedback on other open PRIs for documents

FYI: The Editorial Committee has no new feedback on other open PRIs for documents at this time.


E4. Editorial Feedback on PRI #428 for Unicode 14.0.0 Alpha Review

FYI: The following items are extracted from the feedback received for PRI #428. Items which have already been addressed (with dispositions noted in red in the feedback page for PRI #428) are not included. Items which cover technical and data issues in the purview of the Properties & Algorithms Group are not listed here; only items which seem appropriate for resolution by the Editorial Committee are listed.


Date/Time: Mon Feb 15 19:56:28 CST 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Suggestions on the alpha code chart of Diacritical Marks Extended

1. Whenever a header says "Used in..." It should read instead "Marks for..."

2. The header above 1AC1 should say (after the current header) "... Do not use 
pairs of these marks as replacement for 1ABB or 1ABD"

3. The two marks "combining double plus above and below" should be moved up, 
to be next to the single "plus sign above" and the Ormulum marks shifted 
down two spots.

4. The bullet note above the "number sign above" currently reads "used 
extensively in J.P. Harrington’s transcriptional notation" I suggest 
for it to read "Used by J.P. Harrington to indicate heavy or contrastive stress"

5. The "combining triple acute accent" should have a mutual cross reference 
to the "combining double acute accent"

Discussion: Item 3 is a technical change that is outside the remit of the Editorial Committee. See related discussion about this code point move in L2/21-069 and in L2/21-073

The Editorial Committee suggests the other items be remanded to the names list editor for appropriate changes.

Suggested associated action item:

AI Ken Whistler. Consider the feedback from Eduardo Marín Silva (Feb 15) on PRI #428 for appropriate changes to the names list for Unicode 14.0.


Date/Time: Sun Feb 14 10:01:09 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Defective glyph for U+1FAE2

The code chart glyph for proposed character U+1FAE2 FACE WITH OPEN EYES AND
HAND OVER MOUTH is inverted, showing a solidly filled face instead of an
outline drawing like the other faces.

Date/Time: Fri Feb 26 15:42:43 CST 2021
Name: Vinodh Rajan
Report Type: Public Review Issue
Opt Subject: Sharada Code Chart

In the character list on Page 3, SHARADA VOWEL SIGN VOCALIC LL and SHARADA 
VOWEL SIGN E are overlapping. This needs to be fixed.

Date/Time: Fri Feb 26 15:56:05 CST 2021
Name: Vinodh Rajan
Report Type: Public Review Issue
Opt Subject: Telugu Nukta Glyph in the Code Chart

As per L2/20-085, Telugu Nukta should have the combining circle below as 
its representative glyph to avoid confusion with the aspirate marker. 

(If the current shape will be retained)
The annotation "can also appear as a large dot" is moot. The glyph is already a dot. 

V
 

Discussion: All three of these glyph changes have already been noted by the code charts editor, who has made appropriate fixes for 14.0.


Date/Time: Mon Mar 1 16:47:35 CST 2021
Name: Erik Carvalhal Miller
Report Type: Public Review Issue
Opt Subject: PRI #428: Comment for U+02B9

The first comment for U+02B9 MODIFIER LETTER PRIME in block Spacing Modifier
Letters (unchanged in the 14.0 alpha) says, “primary stress, emphasis”; I
recommend either removing the word “primary” or else inserting the phrase
“secondary stress”, to better reflect the broad, varied use of the character
in marking stress, as the current wording is misleadingly specific.

Background & reference:  U+02B9ʼs use for primary stress in some
dictionaries is undisputed, but L2/20-286 shows excerpts from historical and
contemporary dictionaries in which phonetic spellings employ U+02B9 for
secondary stress as well.  (As reported in L2/21-016 §I.3o, the UTC rejected
L2/20-286ʼs proposal to separately encode a prime‐symbol variant that
represents primary stress in those excerpts, but the rejection does not
impinge on the secondary‐stress use in evidence.)

Discussion:The editors discussed this, and agreed that removal of the word "primary" in the annotation could make it less confusing. The change has already been rolled into the NamesList.txt file for 14.0.


Date/Time: Wed Mar 31 15:54:10 CDT 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Final round of revision to the codechart anottations, but the second half correspond to the pictograms

The first half corresponds to annotations that I missed the first two
rounds, but the second half corresponds to the pictograms.

Arabic:
  06C5 ARABIC LETTER KIRGHIZ OE: On the second bullet note,instead of reading 
  "a barred form also occurs", it would be better if it read "a glyph variant 
  replaces the looped tail with a horizontal bar through the tail"

Arabic Extended-B: 
  088E ARABIC VERTICAL TAIL: The header above this character should read 
  "Abbreviation mark" instead of "Abbreviation letter" A better phrasing of 
  the bullet note below would be "mark used to indicate abbreviations in moveable 
  type texts from Iran" followed by another note saying: "considered a letter; 
  only attested in final form"

Glagolitic:
  2C2F GLAGOILITIC LETTER CAUDATE CHRIVI: The bullet note cites the characters 
  it can combine with, but the glyphs with the dotted circle are missing. 
  Furthermore, informative aliases should be added "= cherv, chrivi with tail"

Arabic Presentation Forms-A:
  FDCF ARABIC LIGATURE SALAAMUHU ALAYNAA: Another bullet note could be added 
  stating "used in Christian texts"

Kana Extended-B:
  The initial note states that the system in question is "obsolete", which 
  seems to imply that it was replaced by another system, and it also states that 
  it was used in Taiwan; which is true, but it was also used in a nearby region 
  of mainland China.

Ethiopic Supplement:

  Given the new information of the legacy Gurage orthography the header
  above 1380 that reads "Syllables for Sebatbeit" should read "Legacy
  syllables for Gurage orthographies" Followed by a note under this header
  saying "These characters were originally encoded to represent the
  Sebatbeit language, but their use extended beyond that language to an
  entire linguistic region called 'Gurage'; therefore the term 'Sebatbeit'
  inserted in the character names, should not be interpreted as exclusionary
  to other languages, but a mere historical artifact. The orthography for
  the Gurage languages has been updated to use new syllables and these are
  encoded in the 'Ethiopic Extended-B' block." It's unclear if the header
  above 2DC0 (in the Ethiopic Extended block) should also be modified
  accordingly, but the block descriptions in the Spec, should be updated
  accordingly.

Transport and Map Symbols:
  1F6DE WHEEL: The informative alias "= tire" could be added
  1F6DF LIFE BUOY: The informative alias "= life saver" could be added

Geometric Shapes Extended:
  1F7F0 BOLD EQUALS SIGN: The addition of this symbol in this block (as opposed 
  to Symbols and Pictographs Extended-A) is dubious.

Symbols and Pictographs Extended-A:
  1FA74 THONG SANDAL: These informative aliases "= flip flop, chancla" could be added
  1FA78 DROP OF BLOOD: Mutual cross references to "1F4A7 💧 droplet" and "1F322 🌢 black droplet" could be added
  1FA79 ADHESIVE BANDAGE: The informative alias "= band aid" could be added.
  1FA85 PINATA: A bullet note could be added stating "the name is usually spelled 
  with an 'Ñ'(PIÑATA) but Unicode names can only contain ASCII characters"
  1FAAA IDENTIFICATION CARD: There should be an informative alias stating "= ID", 
  as well as a bullet note stating "can be used to represent a driver's license or any other form of photo id"
  1FAAB LOW BATTERY: There should be a mutual cross reference to "1F50B 🔋 battery"
  1FAAC HAMSA: A bullet note could be added stating "can either point up or down".
  1FAE6 BITTING LIP: A mutual cross reference to "1F5E2 🗢 lips" could be added
  1FAF6 HEART HANDS: There is no need for the rays emanating from the "heart"; leaving 
  them may imply that their inclusion is mandatory, so I recommend removing them 
  from the representative glyph. I would also like to ask, whether or not this 
  character can support different skin tones for each hand, in the future; 
  similar to the HANDSHAKE.

Date/Time: Thu Apr 1 19:17:17 CDT 2021
Name: Eduardo Marín Silva
Report Type: Other Question, Problem, or Feedback
Opt Subject: Request to correct errata in my own piece of feedback of the Unicode 14.0 alpha

My last piece of feedback was accidentally called "Final round of revision to the 
codechart anottations, but the second half correspond to the pictograms" with the 
second half added by mistake, so it should instead read "Final round of revision 
to the codechart annotations" with the corrected spelling of 'annotations'
If it's possible, I also noticed that my piece of feedback for the ARABIC VERTICAL 
TAIL reads "considered a letter; only attested in final form", when it should read 
"considered a letter, not a presentation form, but only attested in final form"
Any other mistakes in my pieces of feedback are minor and so do not need correction.

Discussion: The Editorial Committee discussed all of this feedback, and suggests the following dispositions:

Suggested associated action items:

AI Ken Whistler. Consider the feedback from Eduardo Marín Silva (Apr 1) on PRI #428 for appropriate changes to the names list for Unicode 14.0. (See L2/21-070 Section E4 for details of dispositions.)

AI Jennifer Daniel. Consider the feedback from Eduardo Marín Silva (Apr 1) on PRI #428 on emoji-related aliases and glyph changes, and redirect as appropriate. (See L2/21-070 Section E4 for details.)


Date/Time: Sat Apr 3 11:31:51 CDT 2021
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: Error in Egyptian Hieroglyphs file


The Egyptian Hieroglyphs file (U13000.pdf) contains the misspelling
“Invertabrata”. The correct spelling (which was also used by Gardiner) is
“Invertebrata”.

Discussion: This minor typo has already been corrected in the 14.0 version of NamesList.txt.


Date/Time: Sun Apr 11 02:28:55 CDT 2021
Name: Patrik Sjöwall
Report Type: Public Review Issue
Opt Subject: Unicode 14.0 Alpha review


I found a few issues with some characters for Unicode 14.0 that seem to have
gone unnoticed:

0874 ARABIC LETTER ALEF WITH ATTACHED KASRA
0875 ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA
0879 ARABIC LETTER ALEF WITH ATTACHED ROUNDDOT BELOW
087C ARABIC LETTER ALEF WITH RIGHT MIDDLE STROKE AND DOT ABOVE
087D ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA AND DOT ABOVE
0880 ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA AND LEFT RING

These letters reqiure more shaping information. It is not clear how the
attached fatha or dot will behave in an obligatory LAM-ALEF ligature.


088E ARABIC VERTICAL TAIL

This character is missing in ArabicShaping-14.0.0.txt, but it always joins
with the preceding letter. It should be included in that file, either as
Right_Joining or be given a new joining type (since it does not change its
shape, only causes the character to its right to join), and with either a
joining group of its own or No_Joining_Group.


08FB ARABIC DOUBLE RIGHT ARROWHEAD ABOVE
08FC ARABIC DOUBLE RIGHT ARROWHEAD ABOVE WITH DOT

The comment "also used in Quranic text in African and otherorthographies to
represent dammatan" should come after 08FB, not 08FC. The "right arrowhead"
is an angular-shaped damma, and the "dammatan" is a double damma (not a
double damma with dot).


A7C0 LATIN CAPITAL LETTER OLD POLISH O
A7C1 LATIN SMALL LETTER OLD POLISH O

This letter should be named "O ROGATE", the name "commonly used among
specialists" according to the proposal. Then a comment below could say "used
for nasal vowel in Old Polish". The current name sounds like this was a
letter used instead of "O" in Old Polish, which is not the case.


A7D3 LATIN SMALL LETTER DOUBLE THORN
A7D5 LATIN SMALL LETTER DOUBLE WYNN

These two small letters are added to the standard without matching capitals.
That is incosistent with how other comparable letters are encoded. Letters
used in a casing orthography are almost always encoded as casing pairs, even
if they do not appear in the beginning of a word and the capital leter thus
only appears in ALL-CAPS TEXT. As far as I know at least the following
capitals were encoded without being needed outside all-caps:

    0184 LATIN CAPITAL LETTER TONE SIX
    01A6 LATIN LETTER YR
    01A7 LATIN CAPITAL LETTER TONE TWO
    01BC LATIN CAPITAL LETTER TONE FIVE
    0220 LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
    037F GREEK CAPITAL LETTER YOT
    042A CYRILLIC CAPITAL LETTER HARD SIGN
    042C CYRILLIC CAPITAL LETTER SOFT SIGN
    1E9E LATIN CAPITAL LETTER SHARP S
    2C1F GLAGOLITIC CAPITAL LETTER YERU
    2C20 GLAGOLITIC CAPITAL LETTER YERI

It is possible that one or two have been used word-initially in languages
that were not supported when they were added. On the other hand, it is also
quite likely that there are more encoded capitals that never occur in the
beginning of a word.

Apart from that (and issues already addressed by others) everything looks
fine so far.

Best regards!
/Patrik Sjöwall

Discussion: The Editorial Committee considered this feedback. Most is of a technical nature, outside the remit of the Editorial Committee. For the Arabic characters 0874..0875, etc., the observation about shaping should go to the Script Ad Hoc Group for consideration as to whether more documentation should be added regarding behavior in lam-alef ligatures. For 088E ARABIC VERTICAL TAIL, the issue should also go to the SAH for review, to see if it should be added to ArabicShaping.txt.

For the issue of 08FB and 08FC, the Editorial Committee concurs that the annotation is in the wrong location. The names list editor has already made the change in the location of the annotation in the latest draft of NamesList.txt.

For Old Polish o, and the double thorn and double wynn, these issues are outside scope of the Editorial Committee, but we noted that these name changes and requests for capital letters were considered by the SAH already and were not recommended by that group.

No action items need to be recorded, as the SAH is already aware of this feedback.


Date/Time: Sun Apr 11 05:17:33 CDT 2021
Name: Wang Yifan
Report Type: Public Review Issue
Opt Subject: PRI #428: comments on U+1F7F0 and U+1F979


On U+1F7F0:
Might be good to have a cross-reference to U+3013 GETA MARK 
for pure graphic resemblance, and vice versa.

On U+1F9F9:
The current glyph of FACE HOLDING BACK TEARS does not sufficiently 
distinguish it from U+1F9FA FACE WITH PLEADING EYES. A quick 
suggestion that I think effective is to paint tears white 
(non-hatched) and use a dumbbell-shaped mouth.

In the light of the original proposal, this character is 
intended to include the Samsung emoji depicted in the 
page 1 of this document.
http://www.unicode.org/L2/L2020/20064-face-holding-back-tears.pdf

Here, the dumbbell-shaped mouth is a key feature characterizes the emoticon
being a stylized depiction of the lip-biting expression in the East Asian
graphical convention. It is different from both upward (pouting) and
downward (neutral-smiling) curled mouth. This type of expression is also
seen in most of the actual examples cited in the page 5 of the proposal,
thus should not be left out.

Meanwhile, there is U+1F9FA that usually implemented with similarly watery
eyes. (See https://emojipedia.org/pleading-face/)

Even though not reflected in the current code chart, such designs should be
interpreted as the inherent semantics in the original proposal (as FACE WITH
GLISTENING EYES;
https://www.unicode.org/L2/L2017/17244r-emoji-faces-v11.pdf) instead of mere
vendors' discretion, and should be respected as such.

The alpha glyph of U+1F9F9 has a rather intricate design of eyes that makes
it hard to tell tears apart from eyeballs in black-and-white printing. The
tears should be graphically more distinctively separated from its background
in order to avoid misinterpretation that it has exactly same kind of eyes
the existing glyphs of U+1F9FA have. (Optimally, U+1F9FA should be also
updated to have more upward-looking eyes and downward-sloping eyebrows in
the code chart.)

Last year, U+1F9FA was "the third most used emoji on Twitter" according to
Emojipedia, and awarded "Neologism of the Year 2020" in Japan. Special care
should be taken to avoid possible confusion by existing users.

https://blog.emojipedia.org/a-new-king-pleading-face/
https://ja.wikipedia.org/wiki/%E3%81%B4%E3%81%88%E3%82%93

Discussion: The Editorial Committee agreed that the cross-reference suggestion for 1F7F0 to GETA MARK was a good idea. That addition has already been made in the latest draft of NamesList.txt.

The input re 1F9F9 should be reviewed by the Emoji Subcommittee.

Suggested associated action item:

AI Jennifer Daniel. Consider the feedback from Wang Yifan (Apr 11) on PRI #428 regarding U+1F9F9, and redirect as appropriate. (See L2/21-070 Section E4 for details.)


Date/Time: Mon Apr 12 18:08:02 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Currency Symbols

Like the EURO SIGN and other characters, the SOM SIGN U+20C0 should be shown
in a Times-like font. 

Date/Time: Mon Apr 12 18:09:37 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Supplemental Punctuation

The barred square brackets from 2E56..2E58 should be drawn on the same basis
as other square brackets in the code charts. 

Date/Time: Mon Apr 12 18:12:06 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Glagolitic

The glyphs fr the two new characters must be improved. 

Date/Time: Mon Apr 12 18:21:57 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Supplemental Symbols and Pictographs

Something is wrong with the glyphs for 1F979 and 1F97A. The face shown at
1F979 looks just like the glyph for 1F97A in the macOS and iOS Apple Color
Emoji UI font.

Thanks for keeping my TROLL glyph. 

Discussion: The Editorial Committee is of the opinion that the glyph for the SOM SIGN is appropriate as is. The code charts editor has already received an updated revision of the font for Glagolitic (from Sebastian Kempgen). The suggestions for glyph fixes for 2E56..2E58 should be remanded to the code charts editor for investigation. We do not seem to be able to replicate the issue Michael has for 1F979 and 1F97A.

Suggested associated action item:

AI Michel Suignard. Consider the feedback from Michael Eversion (Apr 12) on PRI #428 regarding the glyphs for 2E56..2E58, and investigate whether the glyphs can be made more consistent with other square bracket glyphs. (See L2/21-070 Section E4 for details.)


F. Responses to Other Public Feedback

F1. Public Feedback Noted in L2/21-068

FYI: This review refers to items in L2/21-068 listed under "Feedback routed to Editorial Committee for evaluation". Note that many of the reports in L2/21-068 have already been dealt with. In cases where the disposition is already noted in L2/21-068 (in red), the reports are not repeated here for further discussion and disposition.


Date/Time: Tue Feb 23 12:03:14 CST 2021
Name: Jungshik Shin
Report Type: Error Report
Opt Subject: Hangul collation and Hangul tone marks

Note: Changes have been made in the draft text for version 14.0 in response to [the first part of] this report.

Hello, 

I'm writing to give my feedback on TUC 13 section 18.6 Hangul. 

On pages 746-747, I found the following regarding the collation of Hangul
syllables:

"Because the order of the syllables in the Hangul Syllables block reflects
the preferred ordering, sequences of Hangul syllables for modern Korean may
be collated with a simple binary comparison"

Although the above is certainly the case of South Korean collation order
since 1988 [1], it does not hold true for North Korean sorting rules.
Therefore, the locale data for ko-KP needs to be tailored for the Hangul
collation. 

In addition, the section 18.6 does not mention two Hangul tone marks, U+302E
and U+302F. To faithfully represent the old Korean text, Hangul tone marks
are required and should be mentioned along with Hangul Conjoining Jamos. 

It'd be great if the two points above could be reflected in TUS 14 or later.

Thank you for your consideration, 

Jungshik Shin 


[1] Before 1988, there were a couple of 'competing' collation orders even in
South Korea and different dictionaries used different sorting rules. It was
only in 1988 that the South Korean orthographic standard explicitly
specified how to sort Hangul. 

Discussion: The Editorial Committee noted that the first section of this feedback has already been addressed in the latest draft for the 14.0 core specification. For the issue regarding the non-mention of two Hangul tone marks in Section 18.6, the Editorial Committee suggests that the editor follow up with Jungshik to get specific suggestions for text additions to the core specification.

Suggested associated action item:

AI Julie Allen. Work with Jungshik Shin to prepare new text for the core specification Section 18.6, to explain the use of the two Hangul tone marks. For Unicode 14.0.


Date/Time: Tue Feb 23 19:56:30 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: U+034F COMBINING GRAPHEME JOINER is not always ignored for display

Section 5.21 says “U+034F COMBINING GRAPHEME JOINER is likewise always
ignored for display.” This is not true: it has no visible glyph of its own,
but it may have a visible effect on other glyphs. For example, see Figure
7-11 and UTR #53. As section 5.21 says earlier on the same page, “In such
cases, even though the format character or variation selector has no visible
glyph of its own, it would be inappropriate to say that it is ignored for
display, because the intent of its use is to change the display in some
visible way.”

Discussion: The Editorial Committee discussed this feedback, and agrees that the text could be improved, but we are not advising a rewrite for the 14.0 core specification at this time.


Date/Time: Fri Feb 26 03:19:19 CST 2021
Name: huang xin
Report Type: Error Report
Opt Subject: What is the exact definition of assigned character?

The term assigned character seems to have conflict means in the Unicode 
Standard Version 13.0.

Quoted from chapter 2.1:
    "In contrast, a character encoding standard provides a single set of 
     fundamental units of encoding, to which it uniquely assigns numerical 
     code points. These units, called assigned characters, are the smallest 
     interpretable units of stored text."

This suggests that the "units" are called "assigned characters", and "numerical 
code points" are assigned to "assigned characters".

Quoted from chapter 3.5 D49:
    "Private-use code points are considered to be assigned characters"

This suggests that assigned character is a kind of code point.

So there is conflict between the two quotes, if assigned character is some 
kind of code point, how can "numerical code point" be assigned to some kind of code point?

Discussion: The Editorial Committee discussed this feedback, and agrees that the text could be improved, but we are not advising a rewrite for the 14.0 core specification at this time. For this and the prior item, it would help the editors substantially to have concrete suggestions for how to improve the text. Otherwise, we can take it as "problem noted", but no one has stepped forward to actually work on specific text improvements that would pass muster.


Date/Time: Sat Feb 27 21:03:22 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Chapter 17 intro miscounts Indonesian scripts

The introduction to chapter 17 in TUS 13.0 says "Indonesia has many local, 
traditional scripts, most of which are ultimately derived from Brahmi. 
Six of these scripts are documented in this chapter."

The actual number of Indonesian scripts documented in the chapter is seven; 
Makasar is one of them. Maybe get rid of the number, as several more 
scripts are to come?

It’s also not quite clear why Makasar gets its own paragraph; the 
paragraph suggests that it belongs between Rejang and Buginese.

Discussion: This comment has already been addressed by the editors, with appropriate changes made in the draft for the 14.0 core specification.


Date/Time: Fri Mar 12 19:45:54 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Bidi format characters do affect characters’ glyphs

Chapter 5 says “Bidirectional format characters do not affect the glyph
forms of displayed characters”, but that is not true. The main point of that
sentence (that bidi format characters have no glyphs) is still true, but it
needs a better explanation. For example, U+0028 LEFT PARENTHESIS has
different glyphs depending on the bidi level. In general, overriding a
character’s directionality may have an arbitrary effect on its glyph form.

Date/Time: Fri Mar 12 19:56:45 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Unexpected variation sequences do affect display

Chapter 5 says “In other contexts, a format character may have no visible
effect on display at all. [...] Another example is a variation selector
following a base character for which no standardized or registered variation
sequence exists. In that case, the variation selector has no effect on the
display of the text.” However, that is an oversimplification. The presence
of an unexpected variation selector may block another variation sequence,
may block canonical reordering, and may block AMTRA reordering, all of which
have effects on the display of the text.

Discussion: The Editorial Committee noted that the text in Chapter 5 could be improved, but we are not advising a rewrite for the 14.0 core specification at this time.

Date/Time: Fri Mar 12 20:06:54 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Does <ZWJ, ZWJ> equal ZWJ?

UTS #51 defines various sequences with ZWJ, such as <1F415, 200D,
1F9BA>. How should they be rendered when there are multiple ZWJs, as in
<1F415, 200D, 200D, 1F9BA>? According to chapter 5 of the core
specification, “a sequence of two adjacent joiners, <..., ZWJ, ZWJ,
...>, is a case where the extra ZWJ should have no effect.” On the other
hand, I get the impression that extraneous ZWJs go against the spirit of UTS
#51. Is that sentence in the core specification meant to be taken literally?
What effects should other default ignorable code points have within emoji?

Discussion: The Editorial Committee noted that responding to this suggestion would require technical input both from the owners of UTS #51 and more generally the Emoji Subcommittee. No editorial changes are recommended at this time without such input.

Date/Time: Fri Mar 12 20:37:10 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: When does ZWJ act like <ZWJ, ZWNJ, ZWJ>?

Chapter 23 says that “between Arabic characters a ZWJ acts just like the
sequence <ZWJ, ZWNJ, ZWJ>, preventing a ligature from forming instead
of requesting the use of a ligature that would not normally be used.” What
is an Arabic character, and which characters are relevant for the purpose of
“between”? Consider the sequence <meem, ZWJ, U+17B4 KHMER VOWEL INHERENT
AQ, jeem>. The ZWJ is between an Arabic character and a Khmer character.
Is it right to conclude that the ZWJ therefore does not act just like
<ZWJ, ZWNJ, ZWJ>, leaving it free to ligate the meem and jeem?

Discussion: The Editorial Committee noted that his comment reflects a technical concern, and would require input from the Properties & Algorithms group, before any appropriate improvement to the text could be suggested.


Date/Time: Mon Mar 29 23:44:43 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Confusion between nonspacing marks and nonspacing marks

The Unicode Standard has a general category Mn “nonspacing mark”. The
Unicode Standard also has a definition D53: “Nonspacing mark: A combining
character with the General Category of Nonspacing Mark (Mn) or Enclosing
Mark (Me).”

This definition seems misguided for two reasons:

① Enclosing marks are almost always spacing, contradicting the statement
that supports D53: “It generally does not consume space along the visual
baseline in and of itself.” Adding an enclosure to a glyph requires space –
otherwise it results in a smudge. Of the 25 font families I found on my Mac
that contain U+20DD combining enclosing circle, only one monospaced font
uses an enclosing circle glyph with the same width as any other glyph,
predictably resulting in smudges. All 24 others use a glyph that’s large
enough to accommodate the glyphs of most base characters with some padding,
which means it’s substantially wider than most base glyphs. This is very
different from the exceptional and context-dependent widening described for
the real nonspacing mark U+0302 combining circumflex accent in “î”.

② Using the same term for two related but different concepts results in
confusion. This is most obvious in an example for a regular expression
character class in TUS appendix A Notational Conventions, page 941, which
describes [\p{gc=Nonspacing_Mark}] as “nonspacing marks” – clearly correct
based on the general category and clearly wrong based on definition D53. TUS
section 5.12 Strategies for Handling Nonspacing Marks, page 217, claims
“Properly speaking, a nonspacing mark is any combining character that does
not add space along the writing direction.” and again “Composite character
sequences can be rendered effectively by means of a fairly simple mechanism.
In simple character rendering, a nonspacing combining mark has a zero
advance width, and a composite character sequence will have the same width
as the base character.” Both statements are incorrect for enclosing marks in
most fonts. This leads to an inappropriate truncation strategy on page 219:
“In simple systems, it is easiest to truncate by width, starting from the
end and working backward by subtracting character widths as one goes.
Because a trailing nonspacing mark does not contribute to the measurement of
the string, the result will not separate nonspacing marks from their base
characters.” Page 222 discusses letterspacing: “This process needs to be
modified if zero-width nonspacing marks are present in the text. Otherwise,
if extra justifying space is added after the base character, it can have the
effect of visually separating the nonspacing mark from its base.” This issue
would affect non-zero-width nonspacing marks as well, which D53 creates. And
so on...

I suggest changing D53 to define “nonspacing mark” based only on general
category Mn, and discussing enclosing marks either together with nonspacing
marks or separately, as appropriate in each context.

Discussion: The Editorial Committee feels that this suggestion has merit, but we are not advising a rewrite for the 14.0 core specification at this time. The terminological treatment of enclosing marks (gc=Me) as nonspacing marks is of long standing in the standard (going back nearly 30 years), and a change in the core definitions of Chapter 3 for this would require a very specific and detailed proposal arguing the case and working through the implications for the text in Chapter 3, other parts of the core specification, and ultimately other specifications and pages on the website.

Date/Time: Tue Mar 30 00:11:37 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incomplete discussion of combining marks

The Unicode Standard has two sections with guidelines on nonspacing marks:
5.12 Strategies for Handling Nonspacing Marks and 5.13 Rendering Nonspacing
Marks.

The second paragraph of the first of these sections says: “In this section
and the following section, the terms nonspacing mark and combining character
are used interchangeably.”

This sentence is confusing because the terms are not interchangeable at all:
Combining characters, according to definition D52, include nonspacing
(general category Mn), spacing (Mc), and enclosing (Me) marks. Even when
applying the dubious definition D53, nonspacing marks do not include spacing
marks.

Most of the issues described in the two sections affect spacing and
enclosing marks as well, so the sections are incomplete if they don’t cover
them. The solutions, however, often need to be modified for them.

Discussion: The Editorial Committee considers these suggestions to be reasonable, but we would need specific text changes for review. Note that any changes to this text might also depend on the treatment of basic definitions in Chapter 3.

Suggested associated action item:

AI Norbert Lindenberg. Provide a proposal for specific text changes to improve the discussion of nonspacing marks in sections 5.12 and 5.13 of the core specification.

Date/Time: Tue Mar 30 00:15:01 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect statement about grapheme clusters

The last paragraph of TUS section 2.11 Combining Characters contains this
statement: “This core concept is known as a *grapheme cluster*, and it
consists of any combining character sequence that contains only *nonspacing*
combining marks or any sequence of characters that constitutes a Hangul
syllable (possibly followed by one or more nonspacing marks).”

This statement is incorrect. Both kinds of grapheme clusters defined in UAX
29, legacy grapheme clusters and extended grapheme clusters, can contain
*spacing* combining marks.

Discussion: The Editorial Committee agrees that the text should be improved to address this concern.

Suggested associated action item:

AI Ken Whistler. Provide a proposal for specific text changes to rework the discussion of grapheme cluster in Section 2.11 of the core specification, referring out to UAX #29 for definition by algorithm.

Date/Time: Tue Mar 30 00:19:43 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect statements about combining characters

The first paragraph of TUS section 2.11 Combining Characters has two
incorrect statements:

① “Characters intended to be positioned relative to an associated base
character are depicted in the character code charts above, below, or through
a dotted circle.”: In reality, combining characters can be depicted on any
side of a dotted circle, on multiple sides, crossing it, or enclosing it.

② “The Unicode Standard distinguishes two types of combining characters:
spacing and nonspacing.” The standard, at least in its definition of general
categories, distinguishes three types of combining characters: spacing,
nonspacing, and enclosing, although definition D53 then adds ambiguity.

Discussion: The Editorial Committee agrees that the text should be improved to address these incorrect statements.

Suggested associated action item:

AI Ken Whistler, Editorial Committee. Provide corrected text for these two statements in Section 2.11 of the core specification. For Unicode 14.0.

Date/Time: Fri Apr 2 19:05:22 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Unclear reference to “dashes” in TUS section 12.9 Malayalam

TUS section 12.9 Malayalam, page 512 says “... rendering engines should be
prepared to handle Malayalam letters (including vowel letters), digits (both
European and Malayalam), dashes, U+00A0 NO-BREAK SPACE and U+25CC DOTTED
CIRCLE as base characters for the Malayalam vowel signs, U+0D4D MALAYALAM
SIGN VIRAMA, U+0D02 MALAYALAM SIGN ANUSVARA, and U+0D03 MALAYALAM SIGN
VISARGA. They should also be prepared to handle multiple combining marks on
those bases.”

It’s not clear which “dashes” this refers to. The Unicode Standard, in table
6-3 and in PropList.txt, defines two overlapping sets of dashes that
together contain 30 dash characters. It is very unlikely that all of them
are relevant to Malayalam, and OpenType in particular is not good at
handling mixed-script clusters, such as a combination of U+1806 MONGOLIAN
TODO SOFT HYPHEN with U+0D02 MALAYALAM SIGN ANUSVARA.

Discussion: The Editorial Committee agrees that the text is unclear, and suggests that it would be simplest to clarify the text by specifying the list as some dashes that have the property value InSc=Consonant_Placeholder.

Suggested associated action item:

AI Ken Whistler, Editorial Committee. Provide corrected text for for Section 12.9 Malayalam of the core specification, to clarify which dashes are referred to. For Unicode 14.0.

Date/Time: Fri Apr 2 18:21:40 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Dash definitions out of sync

The lists of dash characters in TUS table 6-3 and in PropList.txt are out of sync. 
Table 6-3 includes 007E TILDE, which is not listed as a Dash in PropList.txt.
In turn, PropList.txt lists 2E1A HYPHEN WITH DIAERESIS, 2E3A..2E3B 
TWO-EM DASH..THREE-EM DASH, 2E40 DOUBLE HYPHEN, 10EAD YEZIDI HYPHENATION MARK, 
which are absent from TUS table 6-3.

It’s not clear to me what qualifies 10EAD YEZIDI HYPHENATION MARK as a dash.

Discussion: The Editorial Committee agrees that the table and the data file are out of synch. We suggest that Table 6-3 be updated for the 14.0 core specification. The status of U+10EAD as a dash (or not) is not editorial, and would have to be taken up with the Properties & Algorithms group and/or the Script Ad Hoc Group.

Suggested associated action item:

AI Ken Whistler, Editorial Committee. Update Table 6-3 in the core specification, to make it consistent with the data file that defines dashes, PropList.txt. For Unicode 14.0.


G. Miscellaneous Topics

G1. (None noted)