L2/22-125

Editorial Committee Report and Recommendations for UTC #172 Meeting

Source: Editorial Committee

Date: July 25, 2022

A. Unicode Release Topics

A1. Unicode 15.0 Schedule and Planning

FYI: The significant milestones for the Unicode 15.0 release are:

All dates for alpha and beta review were met. The scheduled release date has not changed.


A2. Unicode 15.0.0 Work

FYI: The 15.0 beta review period is now complete. Pending the results of UTC #172, the Unicode 15.0 release will proceed on schedule.

Note that there are ongoing discussions involving participation from various TC, SC, and group chairs about how best to address project management and release management for the Unicode Standard, both for the remainder of the 15.0 cycle, and for future releases after that. The Editorial Committee has redefined its scope to focus on editorial work on the Core Specification and other Unicode technical specifications, and editing of the technical content on the website. The Editorial Committee will no longer be the lead on the overall project management and release management of the Unicode Standard. Instead, that responsibility reverts to the UTC itself, with details of organizational structure and delegation of tasks under discussion by the Release Management Group (relmgmt).

In the meantime, during this transitional phase, this Editorial Committee report can still serve as the location for some basic bookkeeping regarding UTC approvals and actions to move Unicode 15.0 forward. In particular, it is now time to close the beta review cycle and approve content for final release of Unicode 15.0.

AI Rick McGowan. Close the beta review PRI #453.

AI Rick McGowan. Close all open PRIs for 15.0 proposed updates of UAXes and UTSes: PRI #437 (UAX #38), PRI #438 (UAX #44), PRI #439 (UAX #50), PRI #440 (UTS #10), PRI #441 (UAX #29), PRI #444 (UAX #34), PRI #445 (UAX #45), PRI #446 (UAX #14), PRI #447 (UAX #24), PRI #448 (UAX #41), PRI #449 (UAX #9), PRI #450 (UAX #31), PRI #451 (UTS #39), PRI #452 (UAX #15), PRI #454 (UTS #51), PRI #455 (UTS #46), PRI #457 (UAX #42).

EC-UTC172-R1: The Editorial Committee recommends that:
The UTC authorizes the release of Unicode 15.0, with a target date of September 13, 2022.

AI Ken Whistler, EDC. Prepare an updated NamesList.txt for Unicode 15.0, synched with the final Unicode 15.0 repertoire, as finalized during UTC #172.

AI Michel Suignard, Rick McGowan, EDC. Prepare a set of Unicode 15.0 final candidate code charts for posting.

AI Ken Whistler, PAG. Prepare final candidate data files for the UCD, and the data directories for UTS #10, UTS #39, and UTS #46, for the Unicode 15.0 release.

AI Ned Holbrook, ESC. Prepare final candidate data files for the data directory of UTS #51, and a complete set of emoji final charts for Emoji 15.0.

AI Ken Whistler, UTC. Complete all tasks associated with the 15.0 release.


A3. Unicode 15.0 Core Specification Editing

FYI: The 15.0 core specification editing has neared completion. While there are a few outstanding issues that the Editorial Committee is dealing with, we anticipate that the 15.0 core specification will be delivered on time for the release.


A4. Core Specification Future Development

FYI: A task group in the Editorial Committee has continued working intensely on what they have taken to calling the "TUS Future Project", aimed at finding a way forward to maintaining and publishing the core specification in HTML. Currently, the core specification is maintained in FrameMaker, an editing platform that is nearing end of life and which has limited Unicode support. Switching to HTML as both source and publication vehicle would help us both on the editing and source maintenance front end, but would also make the core specification viewable with browsers that have much more current and robust Unicode support built in.

There are a significant number of obstacles that need to be overcome to make this conversion actually feasible, but the task group has made a lot of progress in the past few months. In particular, we now have tooling that can extract the entire content of the core specification from mif (Maker Interchange Format) files, converting it all to HTML, most of whose format is controlled by CSS styles. Nearly all of the figures are recovered as embedded svg files, converted from the eps files used by FrameMaker.

We are proceeding to analysis of the hack font usage in the text. As of today that analysis is complete through Chapter 17. Liang Hai has written further scripting tools that take the HTML output and mark it up with all the hack font usage in an interactive viewing format. This will make it possible to systematically review and proof all font use in the text, in a format that will let us decide between old content use that can now be safetly converted to Unicode characters for viewing, versus special rendering displays or cutting edge Unicode additions for which we can fall back to embedded svg images extracted from the hack fonts we have been using.

We anticipate that by this fall, shortly after the release of Unicode 15.0, the task group should be far enough along in its conversion project that we can start in on planning whether the new HTML-based maintenance and publication would be feasible for the subsequent Unicode release, or whether it would need more development before it could replace the current scheme.


B. Website Topics

B1. Website Status

FYI: The Unicode technical website has remained stable since our last report.


B2. Website Content Maintenance

FYI: The Editorial Committee continues to provide minor maintenance of pages on the Unicode technical website. For a major initiative to update the FAQ pages, see B3 below.


B3. FAQ Updating

FYI: The Editorial Committee is currently engaging in very active work to update the Unicode FAQ pages. This work started out experimentally as a separate meeting we were calling the "FAQ of the Month", but after several meetings, we have upped the pace, and are now calling them the "FAQ of the Fortnight" meetings.

This effort is currently spearheaded by Asmus Freytag and Ben Yang. To date, we have converted the entire suite of FAQ pages from HTML 4 to HTML 5, and have done a complete CSS makeover for them to improve their look and their mobile friendliness. A number of the pages have had multiple content fixes and updates, and currently we are looking at more extensive content fixes for complex and rather out of date FAQ pages on topics such as IDN and the Web.

We anticipate that this work will continue for some time, so watch the FAQ pages for various improvements. This would also be a good time to communicate with the Editorial Committee if you have concerns about particular FAQ entries or wish to add FAQ content, as there is a lot of momentum in this area right now. The Editorial Committee also anticipates that the lessons we learn about updating HTML and CSS on FAQ pages on the site may be applied in the future to the broader task of updating some of the other current pages on the Unicode technical site.


C. Editorial Committee Process Issues

FYI: The Editorial Committee continues to meet approximately once a month via Zoom, with those monthly meetings now scheduled for 5 hours (with a lunch break).

This report to the UTC includes feedback from the Editorial Committee meetings held on April 28, May 26, June 23, and July 21, 2022.

Public-facing infomation about the Editorial Committee and its work is maintained on the Unicode Editorial Committee page on the website. The Editorial Committee also maintains an internal subsite for use by the committee. People who would like to find out more about the work of the Editorial Committee or contribute to that work should contact the Chair, Julie Allen.


D. UTR Topics

FYI: The Editorial Committee has nothing to bring up separately about various UTRs at this time.


E. PRI Topics

E1. Public Feedback on PRIs for UAXes and UTSes

FYI: The Editorial Committee scanned through feedback submitted on the various open PRIs for proposed updates of UAXes and UTSes for Unicode 15.0, and did not note any feedback specifically addressing editorial issues which needed separate response directly from the Editorial Committee. Most of the feedback dealt with technical content issues addressed by the Properties and Algorithms Group.


E2. Public Feedback on PRI #453 (15.0 Beta Review) [Items noted for Editorial Committee attention]

Date/Time: Tue May 31 21:05:09 CDT 2022
Name: David Corbett
Report Type: Error Report [EDC]
Opt Subject: NamesList.txt

The note for U+2E4E PUNCTUS ELEVATUS MARK should use `*` instead of `@+`.

Discussion: What David Corbett spotted here is a "notice" after a single character. That notice is correct, but was missing an initial asterisk which is used in formatting notices that apply to single characters instead of to blocks or headers for sections of the names list. This has already been corrected in the latest draft of the names list, so no action is required to be recorded.

AI Rick McGowan, UTC. Respond to David Corbett regarding his various feedback on PRI #453 [Tue May 31 21:05:09 CDT 2022, Wed Jun 1 15:57:33 CDT 2022, Wed Jun 1 17:04:19 CDT 2022, Wed Jun 1 17:43:26 CDT 2022, Wed Jun 1 18:43:52 CDT 2022] and noted in L2/22-123 [Fri Apr 22 20:39:14 CDT 2022].


Date/Time: Wed Jun 1 15:57:33 CDT 2022
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: 453 [EDC]

Note: This has been fixed in the upcoming draft.

D52 says “RIGHT-LEFT MARK”. It should say “RIGHT-TO-LEFT MARK”.

Discussion: As noted in the feedback, this error has already been corrected in the latest draft of the core specification. No action needs to be recorded.


Date/Time: Wed Jun 1 17:04:19 CDT 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: 453 [EDC]

Note: This has been fixed in the upcoming draft.

Chapter 4 says “For those scripts that have case (Latin, Greek, Coptic,
Cyrillic, Glagolitic, Armenian, archaic Georgian, Deseret, and Warang
Citi)” which sounds like it is supposed to be an exhaustive list. However,
it omits Adlam, Cherokee, Medefaidrin, Old Hungarian, Osage, and Vithkuqi.
It might not be appropriate to mention Cherokee, given the rest of the
sentence. It might be better not to list any scripts, or to only list a
few, making it obvious that it is not an exhaustive list.

Discussion: This text (in Section 4.2) has been adjusted in the latest draft of the core specification text. No action needs to be recorded.


Date/Time: Wed Jun 1 17:43:26 CDT 2022
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: 453 [EDC]

The note in the code chart for U+11F40 KAWI VOWEL SIGN EU uses U+01DD LATIN
SMALL LETTER TURNED E to represent a schwa. It should use U+0259 LATIN
SMALL LETTER SCHWA.

Discussion: This has been corrected in the lastest draft of the names list. No action needs to be recorded.


Date/Time: Wed Jun 1 18:43:52 CDT 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: 453 [EDC]

The Alchemical Symbols block includes some sets of disunified characters
with similar glyphs with the same meaning, such as U+1F716 ALCHEMICAL
SYMBOL FOR VITRIOL and U+1F717 ALCHEMICAL SYMBOL FOR VITRIOL-2. That
implies that an alchemical symbol’s specific glyph is significant to its
Unicode encoding and that a change in the glyph, even a subtle one, might
change which code point encodes it.

The Unicode 15.0 code chart for Alchemical Symbols uses a new font. All of
the characters have new glyphs. Some of the new glyphs are quite different.
For example, U+1F747 ALCHEMICAL SYMBOL FOR SPIRIT used to have “SP” on the
right side but now instead has a dot on the bottom. That seems significant,
and by the precedent of U+1F716 vs. U+1F717 might even warrant a
disunification.

The new font does not seem to have been discussed anywhere public. I don’t
think it is appropriate to change the font so much, implicitly expanding
some characters’ ranges of valid glyph variation, without discussing the
changes and clarifying Unicode’s alchemical encoding model.

Discussion: The Editorial Committee discussed this feedback and examined the issues noted for the new font used in the beta review charts for the Alchemical Symbols block. We concur that there appear to be issues with some of the glyph changes, and recommend that the font change be reverted for Unicode 15.0, until such time as the developer of the new font can work out any issues with other stakeholders for the use of Unicode alchemical symbols. We think that this font change has not seen enough public review to be sure that none of the glyph changes introduce new issues regarding the identity of some of the characters. We also recommend that the Script Ad Hoc take another look at the proposed new font and work with its developer to address various issues in it.

AI Michel Suignard, EDC. Revert the font change for the Alchemical Symbols block to the font used for Unicode 14.0.

AI Ken Whistler, EDC. Update the delta charts index page for Unicode 15.0, to remove indication of font changes for the Alchemical Symbols.


Date/Time: Mon Jun 6 14:52:29 CDT 2022
Name: Marc Lodewijck
Report Type: Error Report
Opt Subject: PRI #453 Note for U+052B in NamesList.txt [EDC]

A `@+` should be prepended to the note for U+052B CYRILLIC SMALL LETTER DZZHE.


052B  CYRILLIC SMALL LETTER DZZHE
  * also used for Ossetian until 1924

Should read:

052B  CYRILLIC SMALL LETTER DZZHE
@+  * also used for Ossetian until 1924

Discussion: The use of a notice prefix ("@+") on a comment line is the signal that Unibook takes as suppressing the incorrect interpretation of a 4-digit year as a code point. So this observation by Marc is correct, and the change is necessary to prevent incorrect display of the comment in the code charts. The change has already been applied to the names list, so no action needs to be recorded.


Date/Time: Thu Jun 9 16:32:55 CDT 2022
Name: Andrew West
Report Type: Public Review Issue
Opt Subject: 453 [EDC]

Note: This glyph error was fixed in beta charts on June 10, 2022.

The T glyph for U+4EF9 is completely wrong (⿰示坐 instead of expected ⿰亻丰).

Discussion: As noted, this error has already been corrected in the code charts. No action needs to be recorded.


Date/Time: Thu Jun 9 17:34:06 CDT 2022
Name: Andrew West
Report Type: Public Review Issue
Opt Subject: 453 [EDC]

With regard to the "Glyph and Variation Sequence Changes" table at 
https://www.unicode.org/charts/PDF/Unicode-15.0/

1. Code points 585F, 5F50, 6BC0, 7BC9, 833E for CJK Unified Ideographs are in the wrong column.

2. U+27B48 is a typo for U+27BF8 in the CJK Unified Ideographs Extension B row.

3. The significant glyph changes for the following characters are not flagged:
U+93AB (V)
U+2A84E (V)
U+2AF4F (V)
U+2B15C (V)
U+30759 (U)

Discussion: Re item #1, that appeared to be a transient error in the HTML on the page, which was fixed several weeks ago. Re item #2, we concur that there was a typo on the page. That typo has been fixed now, so no action needs to be recorded for that. Re item #3, we concur that highlighting is missing in the delta code charts for those 5 characters.

AI Michel Suignard, EDC. Add U+93AB, U+2A84E, U+2AF4F, U+2B15C, and U+30759 to the blue highlight file to mark those as significant glyph changes in the CJK code charts, when generating the final candidate code charts for Unicode 15.0.


Date/Time: Sat Jun 18 09:48:36 CDT 2022
Name: Wang Yifan
Report Type: Public Review Issue
Opt Subject: 453 [EDC]

There are two places where the note says "also used in Sindhi", but what
the "also" qualifies is not clearly inferable from the surrounding
contexts.

-----
@   General punctuation
(… 13 lines …)
204F  REVERSED SEMICOLON
  * also used in Sindhi
-----
@   Reversed punctuation
2E41  REVERSED COMMA
  * also used in Sindhi
-----

The former appears to have been encoded as a mathematical symbol
(L2/00-119), and the latter as a part of Old Hungarian (L2/09-292), so
these pieces of information should be added for clarity.

Discussion: The Editorial Committee reviewed this feedback and agrees that the annotations could be clarified. It is unclear, however, that U+204F is a mathematical symbol. The repertoire that was encoded in response to L2/00-119 included a small collection of editorial marks and punctuation gathered from technical fonts, in addition to mathematical symbols. We see no reason to declare that U+204F is a mathematical symbol, just because it was added alongside a large collection of what were clearly mathematical symbols. As part of the clarification, we should note that the use in Sindhi is for when Sindhi is written in the Arabic script.

AI Ken Whistler, EDC. Clarify the annotations in the names list for 204F and 2E41, for Unicode 15.0.


Date/Time: Thu Jun 23 18:11:11 CDT 2022
Contact: jimeildotkomm@gmail.com
Name: Aditya Bayu Perdana
Report Type: Public Review Issue
Opt Subject: 453 [EDC]

Note: This is under investigation; an issue with highlighting and font metrics in the delta chart.

Many Kawi glyphs in the name list are inexplicably clipped, namely:

KAWI LETTER II, KAWI LETTER U, KAWI LETTER UU, KAWI LETTER VOCALIC R, KAWI 
LETTER VOCALIC RR, KAWI LETTER VOCALIC L,KAWI LETTER VOCALIC LL, KAWI LETTER AI, 
KAWI DANDA, KAWI DOUBLE DANDA, KAWI PUNCTUATION SECTION MARKER, KAWI DIGIT THREE.

Please consider changing the text settings so that Kawi glyphs are shown in full 
like the glyphs in the character table.

Discussion: This kind of clipping problem in the names list is the result of large descenders that clash with the way Unibook handles line spacing in the names list part of the code charts. The problem has been fixed by reducing the size of the relevant Kawi glyphs in the names list, and will show up as fixed in the next code chart generation. No action needs to be recorded.


F. Responses to Other Public Feedback

F1. Public Feedback Noted in L2/22-123

FYI: This review refers to items in L2/22-123 listed under "Feedback routed to Editorial Committee for evaluation".


Date/Time: Fri Apr 22 11:11:19 CDT 2022
Name: Tim Pederick
Report Type: Error Report
Opt Subject: UnicodeData.txt

U+33D7 SQUARE PH has a compatibility decomposition mapping of <U+0050
LATIN CAPITAL LETTER P, U+0048 LATIN CAPITAL LETTER H>.

This character would appear to be intended to represent the pH measurement
in chemistry, and as such the mapping should have had different letter
case: <U+0070 LATIN SMALL LETTER P, U+0048 LATIN CAPITAL LETTER H>.

The Strong Normalization Stability policy says that this cannot be changed,
and perhaps it is sufficiently trivial to be beneath notice, but perhaps it
could be documented?

Discussion: The Editorial Committee notes that this particular case is an interesting one that has been noticed before. It stems from very early versions of the code charts for these squared abbreviations from Japanese and Chinese standards. The sources for those were sometimes inconsistent in casing of the letters, and were also quite often of very poor quality and hard to interpret. For U+33D7, in particular, the early charts showed "PH", before the glyph was adjusted in later versions. The anomaly in the compatibility decomposition for U+33D7 wasn't caught early enough to make it under the deadline for freezing all decompositions for normalization stability (back in the Unicode 3.0 timeframe in 1999).

An annotation has been added to the names list providing some explanation. No action needs to be recorded.


Date/Time: Fri Apr 22 20:39:14 CDT 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Diaeresis on capital Armenian letters

Chapter 7 says “In Armenian dialect materials, U+0308 COMBINING DIAERESIS,
appears over uppercase U+0531 ayb and lowercase U+0561 ayb, and lowercase
U+0585 oh and U+0578 vo.” Because all caps is used in Armenian, it appears
over uppercase U+0555 oh and U+0548 vo too.

http://www.nayiri.com/imagedDictionaryBrowser.jsp?dictionaryId=101&dt=HY_HY&pageNumber=577 

has an example with U+0548 in the second headword of the third column and
an example with U+0555 in the fourth headword of the third column; the
diacritic looks like U+030F but it’s probably just U+0308. Chapter 7 should
say that U+0308 is used with all six of these bases.

Also, the comma after “DIAERESIS” should be removed.

Discussion: This problem has already been addressed in the current draft of the core specification. No action needs to be recorded.


G. Miscellaneous Topics

G1. (None noted)