L2/21-127

Editorial Committee Report and Recommendations for UTC #168 Meeting

Source: Editorial Commitee

Date: July 22, 2021

A. Unicode Release Topics

A1. Unicode 14.0 Schedule and Planning

FYI: The significant milestones for the Unicode 14.0 release are:

The planned release date is unchanged from that reported in the Editorial Committee Report and Recommendations for UTC #167 Meeting. The beta review is now complete, with the close date on July 13.

Once the UTC has made decisions based on the beta review feedback, the Editorial Committee plans to start coordination of the preparation for publication of Unicode 14.0.

In general, once all of the various feedback related to Unicode 14.0 has been dealt with, we see no obstacle to proceeding with the release as planned. To simplify the record-keeping, we have accumulated the overall recommendation and all associated action items in one place, so they are not scattered with the various content-related feedback suggestions made by the Editorial Committee and by other reviewing groups.

EC-UTC168-R1: The Editorial Committee recommends that
The UTC authorizes the publication of Unicode 14.0.0, with the target date: September 14, 2021.

Suggested associated action items:

AI Rick McGowan. Close PRI #433 (Unicode 14.0.0 beta review).

AI Rick McGowan. Close all open PRIs for UAXes: PRI #416 (UAX #14), PRI #417 (UAX #29), PRI #420 (UAX #44), PRI #421 (UAX #38), PRI #422 (UAX #9), PRI #424 (UAX #31), PRI #431 (UAX #42), PRI #432 (UAX #50).

AI Rick McGowan. Close PRI #423 (UTS #39).

AI Rick McGowan. Close PRI #425 (UTS #10).

AI Rick McGowan. Close PRI #429 (UTS #46).

AI Rick McGowan. Close PRI #430 (UTS #51).

AI Ken Whistler. Coordinate and execute the release of Unicode 14.0.0 with the target date: September 14, 2021.

Note: The details of the release planning are specified in "Release Project Plan: Unicode 14.0", a document maintained by the Editorial Committee.


A2. Beta Review for 14.0.0

FYI: Report on Beta Review: The 14.0.0 beta review was completed on schedule. See below under Section E3 for editorial feedback on PRI #433 for the beta review. Editorial feedback on open PRIs for various specifications is noted in the relevant sections below for each PRI.


A3. Unicode 14.0 Core Specification Editing

FYI: Editorial work on the text of the core specification for Unicode 14.0 is almost complete. Note that some of the content changes originally planned for Unicode 14.0 have had to be postponed, and are now scheduled for inclusion in Unicode 15.0, instead. Such postponement of some planned content happens for each release, but we have had to be particularly vigilant about these decisions this year, in order to ensure that the core specification could be completed on time for the release. It has proven difficult to get sufficient cycles from the various volunteer editors to address some of the knottier text updates of the core specification, due to contending claims on their time.

Nevertheless, the core specification for Unicode 14.0 includes a number of long-delayed improvements, including significant attention to the Malayalam and Sinhala sections of the text, thanks in particular to the contributions of Liang Hai. We also believe that all new repertoire content for Unicode 14.0 has been appropriately addressed in the core specification.


B. Website Topics

B1. Website Status

FYI: The Unicode technical website has remained stable since our last report.


B2. Website Content Maintenance

FYI: The Editorial Committee continues to make minor updates to content on the Unicode technical website. However, we have a continuing concern that attention to the technical website content is very thin, and that we are woefully understaffed for maintenance of such a large and complicated site. The Unicode Consortium continues to grow, and its ongoing work is getting more complex. Website maintenance is not keeping up with all of the organizational changes and challenges this poses.

A good example of the problem is the FAQ section on the website. Many FAQs gradually age out or have other problems that would benefit from re-editing. Some of the FAQ pages should get entire makeovers, to bring them up to date and to have them address current problems rather than old problems. However, the Editorial Committee is always focused on higher-priority projects such as the next Unicode release, so seldom can do anything but address the most flagrant problems in FAQ entries.


C. Editorial Committee Process Issues

FYI: The Editorial Committee continues to meet approximately once a month via Zoom, with those monthly meetings now scheduled for 5 hours (with a lunch break), instead of the longer meetings we used to hold.

This report to the UTC includes feedback from the Editorial Committee meetings held on June 3, July 1, and July 22, 2021.

Public-facing infomation about the Editorial Committee and its work is maintained on the Unicode Editorial Committee page on the website. The Editorial Committee also maintains an internal subsite for use by the committee. People who would like to find out more about the work of the Editorial Committee or contribute to that work should contact the Chair, Julie Allen.


D. UTR Topics

FYI: The Editorial Committee has nothing to bring up separately about various UTRs at this time. Feedback on documents open for public review is covered below.


E. PRI Topics

E1. Editorial Feedback on PRI #433 for Unicode 14.0.0 Beta Review

FYI: The following items are extracted from the feedback received for PRI #433. Items which have already been addressed (with dispositions noted in red in the feedback page for PRI #433) are not included. Items which cover technical and data issues in the purview of the Properties & Algorithms Group or the CJK & Unihan Group are not listed here; only items which seem appropriate for resolution by the Editorial Committee are listed.


Date/Time: Fri Jun 18 01:50:26 CDT 2021
Name: Lim Hian-tong
Report Type: Public Review Issue
Opt Subject: Issues related to Kana Extended-B (Public Review Issue #433)

This is a feedback on Unicode 14.0.0 Beta. I refer to Public Review
Issue #433.

I am writing to request amendments of the code chart for Kana Extended-B, as
shown in the current beta draft of the Unicode Standard, Version 14.0.

The descriptions of U+1AFF0 and U+1AFF8 (“also used for tone six”) should be
removed for the following reasons.
...

[Details of feedback excised. See PRI #433 feedback for full listing.]

Discussion: The Editorial Committee agrees with the detailed justification provided by Lim Hian-tong about the removal of these two annotations. The annotations have already been removed from the names list under preparation for eventual publication with Unicode 14.0.


Date/Time: Mon Jun 28 16:34:26 CDT 2021
Name: Peter Constable [MSFT]
Report Type: Public Review Issue
Opt Subject: Emoji 14 beta

Unicode and UTC do a decent job during the beta for a new Unicode edition of
helping reviewers see what new characters are being added to the next
version. For example, one can readily browse through the following trail:

Open PRIs: https://www.unicode.org/review/ 
PRI 433, Unicode 14 beta: https://www.unicode.org/review/pri433/ 
Unicode 14 beta: https://www.unicode.org/versions/beta-14.0.0.html 
Unicode 14 summary: https://www.unicode.org/versions/Unicode14.0.0/ 
delta code charts: https://www.unicode.org/charts/PDF/Unicode-14.0/ 

For emoji additions (atomic characters or RGI sequences), it's much harder
to find similar delta information.
...

[Details of feedback excised. See PRI #433 feedback for full listing.]

Discussion: The Editorial Committee understands that this issue was discussed at some length by the Properties and Algorithms Group, with participants from the Emoji Subcommittee also in attendance. Although it is clear that there might eventually be editorial implications for some web pages regarding presentation of information about beta review of emoji for some future release, there are no immediate actions to take here, pending whatever decisions are taken by the Emoji Subcommittee about how emoji deltas are to be documented. Note also that much of the information about emoji is generated with programmatic tooling that produces emoji charts pages. The details of that are up to the program maintainers in the Emoji Subcommittee.

We have one recommendation that might help address the problem. Each time a new PRI is posted for UTS #51, Unicode Emoji, it would be helpful to ensure that the PRI page itself contained the relevant links to the corresponding delta charts page(s), to make it easier for a reviewer to find the relevant information that would assist in review.


Date/Time: Mon Jul 12 17:38:53 CDT 2021
Name: Peter Constable
Report Type: Public Review Issue
Opt Subject: UAX44, UTR23 and "string property"

The term "string property" is potentially ambiguous: it might mean a
property over the domain of strings, or a property with a co-domain of
strings, or both.
...

[Details of feedback excised. See PRI #433 feedback for full listing.]

Discussion: This feedback was filed with PRI #433, but is better discussed as part of the feedback on PRI #420 for UAX #44. See below.


Date/Time: Mon Jul 12 18:53:01 CDT 2021
Name: Debbie Anderson
Report Type: Public Review Issue
Opt Subject: Glyph error U+FD44


I found an error in Arabic Pres Forms-A: the glyphs for FD43 and FD44 are 
the same. FD44 is incorrect.
(See https://www.unicode.org/L2/L2019/19289r-arabic-honorifics.pdf)

Discussion: This duplication has been noted, and is being corrected by the chart editors.


Date/Time: Wed Jul 14 05:08:18 CDT 2021
Name: Kent Karlsson
Report Type: Public Review Issue
Opt Subject: NamesList.txt

Proposed additional comments to NamesList.txt (marked with "proposed new
comment" on each proposed addition):

263D  FIRST QUARTER MOON
  = alchemical symbol for silver
  x (first quarter moon symbol - 1F313)
  * a crescent, not the first quarter   proposed new comment

263E  LAST QUARTER MOON
  = alchemical symbol for silver
  x (power sleep symbol - 23FE)
  x (last quarter moon symbol - 1F317)
  x (crescent moon - 1F319)
  * a crescent, not the last quarter   proposed new comment


1F311 NEW MOON SYMBOL
  x (black circle - 25CF)
1F312 WAXING CRESCENT MOON SYMBOL
  * waning crescent moon in the southern hemisphere   proposed new comment
1F313 FIRST QUARTER MOON SYMBOL
  = half moon
  x (circle with left half black - 25D0)
  x (first quarter moon - 263D)
  * last quarter moon in the southern hemisphere   proposed new comment
1F314 WAXING GIBBOUS MOON SYMBOL
  = waxing moon
  * waning gibbous moon in the southern hemisphere   proposed new comment
1F315 FULL MOON SYMBOL
  x (white circle - 25CB)
1F316 WANING GIBBOUS MOON SYMBOL
  * waxing gibbous moon in the southern hemisphere   proposed new comment
1F317 LAST QUARTER MOON SYMBOL
  x (circle with right half black - 25D1)
  x (last quarter moon - 263E)
  * first quarter moon in the southern hemisphere   proposed new comment
1F318 WANING CRESCENT MOON SYMBOL
  * waxing crescent moon in the southern hemisphere   proposed new comment

Discussion: Similar suggestions have been made before about annotations regarding different conventions in the use of symbols for waning and waxing phases of the moon in the northern and southern hemispheres, and have been discussed by the Editorial Committee. Our consensus is that character-by-character annotations regarding northern and southern hemisphere differences do not actually help much in identification of the characters in question in the Unicode Standard. Such information is more than adequately presented in such sources as Lunar Phases. We suggest no action for the names list editor here.


E2. Editorial Feedback on PRI #417 (UAX #29)

FYI: There is no new feedback on PRI #417. Earlier feedback was dealt with in the Editorial Committee Report and Recommendations for UTC #167 Meeting.


E3. Editorial Feedback on PRI #420 (UAX #44)

FYI: There is no new feedback filed under PRI #420. The following feedback was extracted from the feedback for PRI #433, and is discussed here, since it is primarily concerning editorial issues for UAX #44.


Date/Time: Mon Jul 12 17:38:53 CDT 2021
Name: Peter Constable
Report Type: Public Review Issue
Opt Subject: UAX44, UTR23 and "string property"

The term "string property" is potentially ambiguous: it might mean a
property over the domain of strings, or a property with a co-domain of
strings, or both. 

UAX #44 appears to use "string property" to mean a property with a co-domain
of strings. E.g., "String properties are typically mappings from a Unicode
code point to another Unicode code point or sequence of Unicode code
points..."

PU UTR #23 introduces the notion of properties of strings (strings as
domain), and avoids the term "string property", using instead "property
applied to strings" or "property of strings". In the case of properties
with co-domain of strings, it uses clear wording, "string-valued
properties". This is helpful and good.

PU UTR #23 also calls out the terminology issue that exists in UAX #44: 

"Note: Properties classed in [UCDDoc] as type "String" are string-valued
 properties." 

PU UAX #44, however, does not provide similar clarification and
disambiguation. It should, particularly given that Unicode standards
closely associated with The Unicode Standard will include properties of
strings, and one could argue that UCD itself has properties with a domain
of string (e.g., StandardizedVariants.txt as a mapping from an enumerated
set of strings to boolean True).

Discussion: The Editorial Committee discussed this feedback and agrees that some textual additions should be made in UAX #44, to adapt some of the related terminology from UTR #23 and to clarify string-valued properties versus properties of strings. This clarification would ideally be done in Version 14.0, if feasible, but no later than Version 15.0.

Suggested associated action item

AI Ken Whistler, Editorial Committee. Clarify terminology related to properties of strings in UAX #44, following UTR #23 where feasible. For Unicode 14.0. [Ref. Peter Constable Mon Jul 12 17:38:53 CDT 2021 cited in L2/21-127.]


E4. Editorial Feedback on PRI #431 (UAX #42)

Date/Time: Sat Jul 10 13:18:17 CDT 2021
Name: Ken Lunde
Report Type: Public Review Issue
Opt Subject: PRI #431 (UAX #42) feedback

I checked all known versions of the Unihan database, and I cannot find the
kWubi property that is documented in Section 4.4.23, Unihan properties, of
UAX #42 as follows:

code-point-attributes &= attribute kWubi
     { text }?

This property is also not mentioned in UAX #38.

Unless someone can find a version of the Unihan database that includes this
particular property, I recommend that its entry be dropped from UAX #42.

Date/Time: Sun Jul 11 09:00:18 CDT 2021
Name: Ken Lunde
Report Type: Public Review Issue
Opt Subject: Additional information for PRI #431 (UAX #42) feedback

The following information is relevant to my recommendation that the kWubi
property be dropped from Section 4.4.23 of UAX #42:
...

[Details of feedback excised. See PRI #431 feedback for full listing.]

Discussion: This feedback is more technical than editorial, since it concerns removal of a specific property from the Unihan properties tracked in UAX #42 (and the UCD in XML). However, the Editorial Committee concurs with Ken Lunde and the rationale he provides for the change. The editor of UAX #42 has already agreed to make the change in UAX #42 for Unicode 14.0, and the Editorial Committee will review that change before publication.

Suggested associated action item

AI Eric Muller, Editorial Committee. Remove kWubi from Section 4.4.23 UAX #42 (and its entry in the corresponding schema), for Unicode 14.0.


E5. Editorial Feedback on PRI #415 (UTR #23)

FYI: No feedback has been received on the proposed update for UTR #23. The proposed changes seem to be unobjectionable. The Editorial Committee suggests that the UTC proceed to approval of this proposed update, so that the publication of UTR #23 can occur in the same time frame as the publication of Unicode 14.0. This will provide the appropriate context for the explanation of the use of string properties in the UCD (for some emoji-related properties) and by UTS #18.

EC-UTC168-R2: The Editorial Committee recommends that
The UTC authorizes the publication of UTR #23 (revision 13), based on the current proposed update text (revision 12).

Suggested associated action items:

AI Rick McGowan. Close PRI #415 (UTR #23).

AI Ken Whistler, Asmus Freytag, Editorial Committee. Prepare final text of UTR #23 for publication.

AI Rick McGowan. Post final text of UTR #23 for publication.


F. Responses to Other Public Feedback

F1. Public Feedback Noted in L2/21-125

FYI: This review refers to items in L2/21-125 listed under "Feedback routed to Editorial Committee for evaluation".


Date/Time: Mon Jun 14 16:23:25 CDT 2021
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: On the response of the editorial comitee on my suggested modifications

This is a response to document L2/21-106: https://www.unicode.org/L2/L2021/21106-u14-annotation-resp.pdf 

I would like to begin by expressing my gratitude and delight at the answer
of the editorial committee. I hope this can serve as an opportunity for
greater engagement between me and the body in the future.

...

[Details of feedback excised. See L2/21-125 for full listing.]

Discussion: This feedback is on L2/21-106, and represents further comments on the names list annotational suggestions in that document. The Editorial Committee suggests that this feedback be taken into account by the names list editors, when working with L2/21-106 to make improvements to the annotations for U+0000..U+00FF, either for Unicode 14.0 (as feasible) or for a future release.


Date/Time: Wed Jun 16 00:20:22 CDT 2021
Name: Neal Raulerson
Report Type: Error Report
Opt Subject: Correction in Standard p.126 D93b a.

Instead of:
"a. the initial subsequence of a well-formed code unit sequence..."

I think it is supposed to be:
"a. the initial subsequence of an ill-formed code unit sequence..."

It makes more sense that way. Please let me know, thanks!

Discussion: The Editorial Committee, in consultation with members of the Properties and Algorithms Group, has determined that the text in question is correct as stated. No change is advised.


Date/Time: Sun Jun 27 12:38:08 CDT 2021
Name: Alexei Chimendez
Report Type: Error Report
Opt Subject: Use of CANCEL TAG in emoji flags

UTS #51 allows for the interchange of various flags through "emoji tag
sequences", specified as: an emoji character or sequence, followed by one
or more component characters from the block Tags, and terminated with the
character CANCEL TAG.

In the Unicode Standard, sec. 23.9 reads:

> There are two uses of cancel tag. To cancel a tag value of a particular
 type, prefix the cancel tag character with the tag identification
 character of the appropriate type. [...] To cancel any tag values of any
 type that may be in effect, use cancel tag without a prefixed tag
 identification character.

Continuing, it specifies:

> Inserting a bare cancel tag in places where only the language tag needs
 to be canceled could lead to unanticipated side effects if this text were
 to be inserted in the future into a text that supports more than one tag
 type.

However, the use of CANCEL TAG in flags is, in effect, a "bare cancel tag",
because it is not preceded by a tag identification character (it is only
preceded by tag component characters). The presence of an emoji flag in a
text may thus inadvertently cause the canceling of all applicable tags.

While the Standard currently only specifies one kind of tag (the language
tag, which is "strongly discouraged"), the use of CANCEL TAG in emoji flags
may cause issues if other kinds of tags are introduced in the future, or
for applications or protocols that make use of "private use" tags to signal
in-band information.

The simplest solution is to change the wording in sec. 23.9 to read:

> To cancel any tag values of any type that may be in effect, use cancel
 tag without a prefixed tag identification character or other tag
 character.

With this change, the CANCEL TAG character in the sequence

> U+1F3F4 U+E0066 U+E006F U+E006F U+E007F

has no effect and is ignored, while in the sequence

> U+1F3F4 U+66 U+6F U+6F U+E007F

the CANCEL TAG character will cancel all tags. This change prevents the
inadvertent canceling behavior of emoji tag sequences as described above.

Discussion: The Editorial Committee considers this feedback to have technical implications. It is not merely editorial, but rather has a bearing on the actual interpretation of a CANCEL TAG in the formal syntax of tag sequences, in cases where more than one kind of tag might interact. This issue should be remanded to the Properties and Algorithms Group for consideration.

Suggested associated action item

AI Markus Scherer. Add discussion of CANCEL TAG (from L2/21-125) to the agenda of the Properties and Algorithms Group.


Date/Time: Fri Jul 2 18:12:11 CDT 2021
Name: Mark Roberts
Report Type: Problems / Feedback about website
Opt Subject: Em and En Dash and Space

You you please consider adding a Q&A on this page:
https://www.unicode.org/faq/punctuation_symbols.html 
 
Question:  Do the widths of the en dash and en space need to half 
the widths of the em dash and em space?
Answer: (I believe the answer is yes--historically it has been.)

Although this PDF
https://www.unicode.org/charts/PDF/U2000.pdf 
implies that the en space is half an em space, it makes no mention of the 
relationship of an en dash to an em dash.  Furthermore, if an en dash is 
supposed to be half an em dash, the glyphs in that same PDF show that the 
en dash to be drawn slightly greater than half an em dash.

I really hope you will address this issue.  It comes up frequently with 
font designers.

Thank you.

Discussion: The Editorial Committee considers it ill-advised for the Unicode Consortium to attempt to prescribe details of font design to type designers, particularly for such well-known characters as en dash and em dash. However, it might make sense to add an FAQ on this topic, if for no other reason than to make it clear that the standard does not actually prescribe exact widths for dash glyphs in various fonts.

Appropriate suggested text for such an FAQ:

Q: Is an EN Dash always half as wide as an EM Dash?

A: It is often claimed that this relationship is true in traditional typography. While it is true that the abstract characters encoded as EN Dash and EM Dash are intended to map to the characters used as modern equivalents of these traditional sorts, the Unicode Standard does not prescribe the actual realization of these in a given font. Deviations from an idealized depiction are left to the discretion of the font designer. Users will use the code point (U+2013) when they mean to use an EN Dash in their text, and a font in which an EN Dash renders wider than an EM Dash would be considered buggy or defective, because that would violate the users' expectations as to the identity of the character to be represented.

Suggested associated action item

AI Asmus Freytag, Ken Whistler, Editorial Committee. Draft an FAQ on dashes, to clarify that exact glyph widths, particularly for en–dash and em—dash, are not prescribed by the standard. (Ref. L2/21-127 for suggested text.)


Date/Time: Tue Jul 6 23:53:00 CDT 2021
Name: J Andrew Lipscomb
Report Type: Public Review Issue
Opt Subject: 14.0.0β issues

These are all in the text accompanying the code charts for Basic Latin and the Latin-1 Supplement.
1. (.) Canadian syllabics full stop is 166E, not 16EE.
2. (:) Tricolon is 205D, not 295D.
3. (C) Degree Celsius is 2103, not 2013.
4. Sections on \, °, x, X, q, and ß have stray text.

Discussion: This report consists of feedback on L2/21-106, which itself is a set of suggestions for useful annotations for the names list in the range U+0000..U+00FF. (Cf. the discussion of the feedback from Eduardo Marín Silva above). It is not actually feedback on the beta review, per se.

The issues cited here are all typos in L2/21-106. These are known issues in the preparation of L2/21-106, which was prepared manually, and without the programmatic control of the tooling that is actually used to prepare the names list for the standard. As input from L2/21-106 is considered for future revisions of the names list, any such typos will be noted and corrected. Note that no revision of L2/21-106 itself is planned—it stands in the document register merely as a set of suggestions for annotations.


G. Miscellaneous Topics

G1. (None noted)