Editorial Committee Report

L2/22-020

Editorial Committee Report and Recommendations for UTC #170 Meeting

Source: Editorial Committee

Date: January 24, 2022

A. Unicode Release Topics

A1. Unicode 15.0 Schedule and Planning

FYI: The significant milestones for the Unicode 15.0 release are:

Alpha start: February 8, 2022

Alpha close: April 5, 2022

Beta start: May 31, 2022

Beta close: July 12, 2022

Release: September 13, 2022

These dates are preliminary, pending further review and consideration of synchronization with UTC meetings and with CLDR and ICU release dates.

Discussion: At this point, we should go ahead and start the alpha review for Unicode 15.0. Note that during UTC #170, the Editorial Committee anticipates that the UTC will be approving additional characters slated for publication as part of the repertoire of 15.0. Because early data files have already been posted for the UTC reflecting the approvals prior to UTC #170, those data files will need to be systematically updated again after this meeting, to ensure that the correct repertoire is reflected into the code charts for alpha review. The alpha review is not intended for review of the details of property assignments -- that is what the beta review is for -- but at a minimum, the repertoire must be correct for meaningful review.

EC-UTC170-R1: The Editorial Committee recommends that: The UTC authorizes starting the alpha review for Unicode 15.0.

AI Ken Whistler. Prepare an updated NamesList.txt for Unicode 15.0, synched with the Unicode 15.0 repertoire, as finalized during UTC #170.

AI Michel Suignard, Rick McGowan. Prepare a set of Unicode 15.0 alpha review code charts for posting.

AI Ken Whistler, EdCom. Prepare a background document for a PRI on the Unicode 15.0 alpha review.

AI Rick McGowan. Post the PRI for the Unicode 15.0 alpha review, to close April 5, 2022.

A2. Unicode 15.0.0 Work

FYI: Per discussion at UTC #169, many items that had previously been scheduled for 15.0 were de-targeted, to simplify release planning. Any of these de-targeted action items could be added back, provided the owning party completes them in time and can coordinate their addition to the relevant component of the release. But these items will no longer be actively tracked as part of the overall content planned for the release.

A3. Unicode 15.0 Core Specification Editing

FYI: The 15.0 Core Specification editing has started The Editorial Committee is working on the backlog of various small content additions previously committed for the text, as well as new sections devoted to the new scripts and other significant new repertoire additions for Unicode 15.0.

B. Website Topics

B1. Website Status

FYI: The Unicode technical website has remained stable since our last report.

B2. Website Content Maintenance

FYI: There is nothing new to report on at this time.

C. Editorial Committee Process Issues

FYI: The Editorial Committee continues to meet approximately once a month via Zoom, with those monthly meetings now scheduled for 5 hours (with a lunch break), instead of the longer meetings we used to hold.

This report to the UTC includes feedback from the Editorial Committee meetings held on October 28 and December 2, 2021, and on January 6, 2022.

Public-facing infomation about the Editorial Committee and its work is maintained on the Unicode Editorial Committee page on the website. The Editorial Committee also maintains an internal subsite for use by the committee. People who would like to find out more about the work of the Editorial Committee or contribute to that work should contact the Chair, Julie Allen.

D. UTR Topics

FYI: The Editorial Committee has nothing to bring up separately about various UTRs at this time.

E. PRI Topics

FYI: The Editorial Committee has no new feedback on the remaining open PRIs at this time. (PRI #427, #435, #436, #437, #438, #439, #440, #441)

E.1 PRI #427, UTS #18

The following late editorial feedback on UTS #18 was received:

Date/Time: Sat Jan 22 04:46:41 CST 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UTS #18
UTS #18 contains the following mistakes:

“expressions.The” (instead of “expressions. The”),
“Database[UAX44]” (instead of “Database [UAX44]”),
“three character” (instead of “three characters”),
“"False" In” (instead of “"False". In”),
“Name values, must” (instead of “Name values must”),
“the the” (instead of “the”),
“see see” (instead of “see”),
“does offers” (instead of “does offer”),
“Equivalents .” (instead of “Equivalents.”),
“(see NameAliases.txt).In” (instead of “(see NameAliases.txt). In”),
“Properties .” (instead of “Properties.”).

An out-of-place closing parenthesis is found here: “[Perl].)” (instead of “[Perl].”).

Finally, a closing parenthesis is missing after “COMBINING DIAERESIS”.
Discussion: These typos should be corrected.

Suggested associated action item:

AI Mark Davis, Edcom. Correct a list of typos in UTS #18, as noted by Ivan Panchenko in feedback on PRI #427.

F. Responses to Other Public Feedback

F1. Public Feedback Noted in L2/22-018

FYI: This review refers to items in L2/22-018 listed under "Feedback routed to Editorial Committee for evaluation".

Date/Time: Sat Sep 18 10:48:29 CDT 2021
Contact: noneed (at) example.com
Name: Jackie
Report Type: Error Report
Opt Subject:

Note: Fake return address was supplied, so cannot contact submitter.
Hi again,

The code charts ( https://www.unicode.org/charts/ ) each should include a standard 
key to the symbols used (e.g., →, ~, ※, etc.). Nothing I see on the code chart PDFs 
defines these symbols or even links to a definition of them.

I looked around and found ( https://www.unicode.org/charts/About.html#Conventions ), 
but I usually access the code charts from pages that contain no link to that page, 
and some are saved locally.

Thank you!
Discussion: The Editorial Committee discussed this feedback and decided that the simplest way to accommodate this request would be to add a link to the About Charts page from the copyright cover sheet that is added to each chart before it is posted. This would be a fairly easy change to make.

Suggested associated action item:

AI Rick McGowan, Edcom. Adjust the text of the copyright cover sheet for code charts for Unicode 15.0, to add a link to About Charts in an appropriate location on the sheet.

Date/Time: Sat Sep 18 15:44:27 CDT 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Mistakes in definition D56
Definition D56 in chapter 3 says “Combining character sequences involving a
variation selector (which is both default_ignorable and a combining mark),
consist of only the base character followed by a single variation
selector”, but that is not true. U+1031 MYANMAR VOWEL SIGN E is not a base
character, but it does have a defined variation sequence. Also, you could
have a sequence like <U+0030, U+FE0F, U+20E3>, which does not consist
of *only* the base character followed by a single variation selector: it
consists of the base, the variation selector, and another mark.
Discussion: The Editorial Committee reviewed this feedback, and agreed that clarifications are required.

Suggested associated action item:

AI Ken Whistler, Edcom. Update the text following D56 in the Core Specification, to clarify some edge cases involving combining character sequences. For Unicode 15.0.

Date/Time: Mon Sep 27 18:48:22 CDT 2021
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: On the Tiddu mark and Virama+Repha of Tulu-Tigalari
This is a response to the following document:
https://www.unicode.org/L2/L2021/21210-tulu-tigalari.pdf 

In page 41, section 8.2 it explains the function of the mark and even
compares it to a "caret".  Currently the dotted circle in the
representative glyph, suggests this is a combining sign; but it is my
opinion that this should be treated similarly to the caret; a zero advance
graphical indicator. This is because the sign is meant to be an
after-the-fact addition to the text, which means it should not affect the
original spacing of the text at all; this includes vowel signs that apply
below the base. If the current model is used, the rendering of the script
would become more complicated that it already is. This change would also
make it easier to display it in more situations, like after whitespaces or
non-letters. The general category of it would be 'Po' and the CCC would be
0.This change of properties would also disambiguate it from other
characters like, 208A ₊ SUBSCRIPT PLUS SIGN and 031F ◌̟ COMBINING PLUS SIGN
BELOW

I would also like to suggest to encode one more character, to reproduce the
behavior on page 34, where the Virama and the Repha can fuse, despite them
not being adjacent in the sequence. Instead, I propose encoding another
character called: TULU-TIGALARI VIRAMA WITH REPHA. This would reduce the
complexity necessary to input this character. It can have the same
properties as the Virama and be placed at 113DE, so no characters need to
be shifted from their current positions.
Discussion: This is feedback on a Tulu-Tigalari proposal. There is no immediate action here for the Editorial Committee. There may be content in this feedback of interest to the Script Ad Hoc.

Date/Time: Fri Oct 1 14:05:49 CDT 2021
Name: David McCreedy
Report Type: Error Report
Opt Subject: The Unicode Standard, Version 14.0.0
FYI: Section 15.15 of The Unicode Standard still lists the old Ahom block range 
end (Ahom: U+11700–U+1173F) instead of the 14.0 updated range end (U+1174F) at 
https://www.unicode.org/versions/Unicode14.0.0/ch15.pdf#G95570.  Refer to the 
"11700..1174F; Ahom" line in http://www.unicode.org/Public/UNIDATA/Blocks.txt 
for confirmation.  Thanks.
Discussion: The Editor of the Core Specification is aware of this range defect. It is already corrected, and no action needs to be recorded.

Date/Time: Fri Oct 1 16:13:29 CDT 2021
Name: Peter Constable
Report Type: Error Report
Opt Subject: Kayah Li code chart / NamesList.txt

Note: This has already been taken into account in the Unicode 15.0 nameslist draft.
In the Kayah Li names list, the following vowel letters are listed under the 
subhead "Consonants":

A922 ꤢ KAYAH LI LETTER A
A923 ꤣ KAYAH LI LETTER OE
A924 ꤤ KAYAH LI LETTER I
A925 ꤥ KAYAH LI LETTER OO

In NamesList.txt, the @Vowels subhead follows A925, but should be moved up to follow A921.
Discussion: No new action needs to be recorded for this.

Date/Time: Wed Oct 6 14:29:53 CDT 2021
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: Pending errata notices
This is a remainder that certain glyph corrections, lack an errata notice; despite 
being recommended by the Script Ad-Hoc. Only the first document precedes UTC #169. 
My intention is avoid the accidental omission of these by having them documented 
togueter for reference.

  Canadian Syllabics: https://www.unicode.org/L2/L2021/21141-ucas-revisions.pdf  
    (limited to the 3 yellow highlighted characters)
  Old Turkic: https://www.unicode.org/L2/L2021/21153-n5163-old-turkic-glyph.pdf 
  Khitan Small Script: https://www.unicode.org/L2/L2021/21182-khitan-mods.pdf 
  Sundanese: https://www.unicode.org/L2/L2021/21221-three-sundanese-chars.pdf 
Discussion: The Editorial Committee reviewed this feedback. As of November 15,2021, the action items regarding Old Turkic, Khitan Small Script, and UCAS glyph errata had been completed. The response to L2/21-182 did not require a glyph erratum notice. No new action needs to be recorded for this feedback.

Date/Time: Sat Nov 6 14:45:05 CDT 2021
Name: Jens Maurer
Report Type: Error Report
Opt Subject: NamesList.txt
https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt 

says, in particular,

# Note that no formal name alias for the ISO 6429 "BELL" is
# provided for U+0007, because of the existing name collision
# with U+1F514 BELL.

0007;ALERT;control
0007;BEL;abbreviation

Yet, https://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt 

says

0007  <control>
  = BELL

which (according to section 24.1 of the Unicode standard) introduces 
the normative alias BELL. However, that not desired according to the 
comment in NameAliases.txt.
Discussion: The Editorial Committee reviewed this feedback and agreed that the alias in the Unicode names list is somewhat misleading. To follow current practice (and NameAliases.txt), it would probably be better to list = BEL and = alert as aliases, rather than = BELL.

Suggested associated action item:

AI Ken Whistler, Edcom. Adjust the aliases in the Unicode names list for U+0007 for Unicode 15.0, to better match NameAliases.txt.

Date/Time: Sat Nov 6 14:50:58 CDT 2021
Name: Jens Maurer
Report Type: Error Report
Opt Subject: NamesList.txt
https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt 

says

000A;LINE FEED;control
000A;NEW LINE;control
000A;END OF LINE;control

meaning that all three aliases are intended to be normative aliases 
per section 4.8 of the Unicode standard.

However, https://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt 
says

000A  <control>
  = LINE FEED (LF)
  = new line (NL)
  = end of line (EOL)

meaning that "new line" and "end of line" are not presented as a 
normative alias in CodeCharts.pdf (because they are not uppercase).

(The same situation appears for other control characters that have 
more than one alias.)
Discussion: The Editorial Committee reviewed this feedback. It was noted that the listing of aliases for C0/C1 control codes is somewhat distinct from that for other characters. In Unicode 3.0, there was only a single alias listed for each control code, and it was taken from ISO 6429 and displayed in all caps. It was only significantly later that multiple control code aliases and abbreviations were added to a new data file, NameAliases.txt. All of the aliases in NameAliases.txt are normative, of course, but they are not all carried directly into the Unicode names list for the code charts. Other than for normative aliases of type "correction", there is significant editorial leeway regarding what gets shown in the code charts.

However, we agreed that there is a bit of a disconnect between the conventions used in the code charts for control code aliases and that Section 24.1 claims about the conventions for normative and informative aliases. This should be clarified.

Suggested associated action item:

AI Ken Whistler, Edcom. Clarify the conventions used in display of normative and informative aliases for control codes, in Section 24.1 of the Core Specification, for Unicode 15.0.

Date/Time: Mon Nov 8 11:00:48 CST 2021
Name: Peter Constable
Report Type: Error Report
Opt Subject: UAX #44
In 5.2, the description for Extended_Pictographic says,

"Note: This property is used in the regex definitions for the Default Grapheme 
Cluster Boundary Specification in UAX #29, Unicode Text Segmentation [UAX29], 
as well as for the definition ED-4 in UTS #51, Unicode Emoji [UTS51]."

It fails to mention use in LB30b that was added to UAX #14 in Unicode 14.
Discussion: The feedback is correct. The entry for Extended_Pictographic in UAX #44 should add a mention of that use.

Suggested associated action item:

AI Ken Whistler, Edcom. Add mention of the use of Extended_Pictographic in LB30b of UAX #14 to the UAX #44 Table 9 entry for Extended_Pictographic. For Unicode 15.0.

Date/Time: Tue Nov 23 16:53:54 CST 2021
Name: Jonathan Yavner
Report Type: Error Report
Opt Subject: UAX #14
"If U+2061 CAUTION SIGN had been used, which also looks like an 
exclamation point inside a triangle, ..."

But U+2061 is actually "FUNCTION APPLICATION", which has no appearance.

The text should read "U+2621 CAUTION SIGN".

This error was introduced in version 19 (dated 2006-08-22) and 
has lain there in plain sight ever since.
Discussion: This typo been corrected in a draft of UAX #14, to be posted soon as a proposed update. The actual character intended is U+26A0 WARNING SIGN. No new action needs to be recorded.

Date/Time: Wed Nov 24 14:16:50 CST 2021
Name: Petr Viktorin
Report Type: Error Report (UTR #39)
Opt Subject:
Section 4, Confusable Detection in UTR#39   refers to  Section 2.9.1, 
Backward Compatibility in Unicode Technical Report #36
The correct section number for "Backward Compatibility" is 2.10.1

See:
 https://www.unicode.org/reports/tr39/#Confusable_Detection 
 https://www.unicode.org/reports/tr36/#Backwards_Compatibility 

Similar errors appear in 5.2 Restriction-Level Detection, 6 Development 
Process, 6 Development Process, and 3.1 General Security Profile for Identifiers of UTR#39
Discussion: These section numbering errors should be corrected.

Suggested associated action item:

AI Mark Davis, Edcom. Correct the section numbering errors in UTS #39, as noted by Petr Viktorin, for Unicode 15.0.

Date/Time: Sun Dec 5 00:48:41 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Core specification
The introduction of chapter 16 of the Unicode Standard, “Southeast Asia” states 
“The scripts of Southeast Asia are written from left to right.”

This statement is not correct for all scripts of Southeast Asia; 
Hanifi Rohingya is written from right to left.
Discussion: This infelicity in the text has been noted by the Editor of the Core Specification, and has already been corrected. No new action needs to be recorded.

Date/Time: Sun Jan 2 06:27:41 CST 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UAX #14 and UAX#44
In the Unicode Standard Annex #14, it is said about hyphenation that
in “German and Swedish, a consonant is sometimes doubled”. I suggest
changing “German” to “pre-reform German orthography” because nowadays no
consonant is struck out compared to the hyphenated form,
e.g., “Schifffahrt” is written with three fs even when unhyphenated
(pre-reform: “Schiffahrt”, hyphenated “Schiff- / fahrt”).

Also, UAX #14 contains the doublings “the the” and “by by”.

UAX #44 contains the mistakes “stabiity”
(instead of “stability”), “inadvertant”
(instead of “inadvertent”), “definining”
(instead of “defining”), “discunifications”
(instead of “disunifications”), “compatiblity” (instead of “compatibility”)
and “"TU-" (kIRG_TSource0 prefix, or 'VU-" (kIRG_VSource0 pefix”
(instead of “"TU-" (kIRG_TSource0) prefix, or "VU-"
(kIRG_VSource0) prefix”).
Discussion: These various typos have been noted by the editors of UAX #14 and UAX #44. The proposed update for UAX #44 has already been corrected and posted. The typos in UAX #14 have been corrected in a draft to be posted soon for a proposed update. No new actions need to be recorded.

Date/Time: Thu Jan 6 12:26:50 CST 2022
Name: John Hudson
Report Type: Error Report
Opt Subject:
Page 488 in the Bengali section of chapter 12 (South and Central Asia-I) of
TUS discusses Jihvamuliya and Upadhmaniya in ligatures with following
consonant letters, hopefully making it clear to shaping engine implementers
that these character sequences should be treated as clusters for shaping
purposes. A similar discussion with examples is missing from the Devanagari
section of the same chapter.

The Devanagari and Bengali handling of Jihvamuliya and Upadhmaniya are
graphically distinct but functionally identical, and this should be
reflected in parallel discussions, perhaps with added explicit statements
that these sequences should be processed as clusters.
Discussion: The Editorial Committee briefly discussed this feedback, but in the absence of specific textual suggestions, this would require some investigation and drafting. It probably should not be added to the Unicode 15.0 tasks.

Suggested associated action item:

AI Liang Hai. Investigate the Devanagari and Bengali handling of Jihvamuliya and Upadhmaniya and make suggestions for possible text additions to the Core Specification.

Date/Time: Fri Jan 7 16:22:34 CST 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UAX #42
UAX #42 contains the following mistakes: “the the” (instead of “the”), 
“intented” (instead of “intended”), “inheritence” (instead of “inheritance”), 
“accross” (instead of “across”), “attribues” (instead of “attributes”), 
“representedy” (instead of “represented”).
Discussion: These typos should be corrected.

Suggested associated action item:

AI Ken Whistler, Edcom. Correct a list of typos in UAX #42, as noted by Ivan Panchenko. For Unicode 15.0.

Date/Time: Sun Jan 16 10:47:31 CST 2022
Contact: [email protected]
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UTS #51
UTS #51 contains the following mistakes: "a emoji" (instead of "an emoji"), 
"existing existing" (instead of "existing"), "color which is" (instead of 
"color is"), "should taken" (instead of "should be taken"), "is all a perfectly 
legitimate" (instead of "is all perfectly legitimate"), "user‘s" (instead of 
"user’s", note the apostrophe), "any any" (instead of "any"), "“us’" (instead 
of "“us”"), "”demon“" (instead of "“demon”", note the quotation marks).

In some occurrences of "[CLDR]", the closing bracket is part of the link text.
Discussion: These typos should be corrected.

Suggested associated action item:

AI Ned Holbrook, Edcom. Correct a list of typos in UTS #51, as noted by Ivan Panchenko. For Unicode 15.0.

G. Miscellaneous Topics

G1. (None noted)