Editorial Committee Report

L2/20-194

Editorial Committee Report and Recommendations for UTC #164 Meeting

Source: Editorial Commitee

Date: July 21, 2020

A. Unicode Release Topics

A1. Unicode 14.0 Schedule and Planning

FYI: The significant milestones for the Unicode 14.0 release are:

Beta start: June 4, 2021

Beta close: July 20, 2021

Release: September 14, 2021

These dates are unchanged from those reported in the Editorial Committee Report and Recommendations for UTC #163 Meeting.

The Editorial Committee is also recommending that the UTC aim to establish the planned repertoire for Unicode 14.0 by its January, 2021 meeting. That will give a sufficient window for public review and feedback before code points and character names are all locked down for the start of beta review in June, 2021. See topic A3 below.

A2. Unicode 13.0 POD

FYI: The print-on-demand (POD) version of Unicode 13.0 was published on May 8. The POD is available at a reasonable price from lulu.com, and the links have been added to the Unicode 13.0 landing page. Volume 1 and Volume 2. There are separate links for each volume instead of just a single link to a Unicode spotlight page because of a site redesign at lulu.com.

Richard Ishida has a copy and reports a rave review: "Looks really good! Nice white paper!"

A3. "Alpha" Pipeline Charts and Early Review for 14.0

FYI: The Editorial Committee is planning to post "Alpha" charts matching the repertoire decided on for the Unicode 14.0 release. These will be accessible from the Pipeline page, to help provide context for the repertoire under review. It may also make sense to prepare a PRI to collect feedback, as the UTC has done in the past for early chart review that was associated with the development of 10646 amendment ballots in SC2. See recent examples: PRI #384 and PRI #366.

The Editorial Committee also had a discussion of the interaction between UTC and WG2, per UTC AI 162-A82. The general assessment was that there are aspects of this interaction that fall within the scope of the Editorial Committee, including updates to website pages concerning Unicode releases, and coordination issues regarding the release process itself. But there are other aspects of the interaction that seem to fall more into the scope of the Unicode officers. We recommend that the officers take up consideration of implications for the organization-level relationship between the Unicode Consortium and SC2, given changes to meeting schedules, ways of operating, and publication schedules.

In the short term, the Editorial Committee thinks it is important to make sure that the UTC communicate its schedule for publication of Unicode 14.0 to SC2 (and WG2), and that it particularly communicate its plans for early review of repertoire. This will give more opportunity for reviewers outside the context of our usual beta reviewers to provide feedback early enough to have an impact on potential decisions about repertoire, character identity, names, and code points. Separately, the Editorial Committee is planning for an update of the Proposed New Scripts page on the website, to better reflect the current process for early script review prior to publication of Unicode 14.0.

EC-UTC164-R1: The Editorial Committee recommends assigning an action item to the UTC liaison to SC2:

AI, UTC liaiaon. Add information about the review and publication schedule for Unicode 14.0 to the liaison report to SC2.

A4. French Charts

FYI: Michel has produced a set of French code charts for Unicode 13.0, based on some very extensive translation work by Patrick Andries and others over the years, to prepare a complete Unicode 13.0 NamesList.txt translated into French. These charts consist of a complete, single archival Unicode 13.0 code chart in French, and a set of block-by-block charts. (In our Editorial Committee parlance, the latter are known as the "unversioned" charts, as they are not archived for a particular version, but instead reflect the kind of block-by-block charts we post for the latest version.)

The single, archival chart is accessible in the Version 13.0 charts directory, and has also been linked form the Unicode 13.0 landing page.

https://www.unicode.org/Public/13.0.0/charts/fr/

The unversioned, block-by-block charts are also in place on the server, but have not been linked up yet. They require a translation of the code charts index page, still in progress. An example can be seen here:

https://www.unicode.org/charts/fr/PDF/U0080.pdf

Once the translated index page is in place, the Editorial Committee plans to announce the availability of these Version 13.0 code charts publicly. We will also be developing some contextualizing information about the French code charts to add to our existing help page for the code charts, to explain the informative status of French translations of Unicode character names.

It is currently unclear whether this set of French code charts will be a one off for Unicode 13.0, or whether a regular update can be eventually integrated into the release process for the Unicode Standard for future versions.

B. Website Topics

B1. Website Status

FYI: Recovery from the catastrophic VM failure that took down the main Unicode technical website in April is now nearly complete. Both the public web server for the technical website www.unicode.org/main.html and the corporate web server have been migrated to new, more capable VM hardware running the latest Linux. A robust backup scheme is now in place, to minimize the potential impact of any future hardware failures.

Mail service has been completely restored for some time now, but is all running on corp.unicode.org, instead of www.unicode.org. One lingering bit of fallout from the crash is that the old, home-brew system for archiving email of the public email list, unicode@unicode.org, in a fully public location, was not restored. Instead, email archiving is still done via the standard mailman interface, as has been done continuously since 2014. Archives of the unicode@unicode.org list since April are all available, but only to current list subscribers logged in via the mailman interface. All of the older, historic email archives have been fully recovered and are publicly accessible. The Editorial Committee is working on updating the documentation regarding the email archives, to minimize public confusion about their availability.

One major bit of functionality that has not yet been fully restored is the set of jsp's that used to run various Unicode tools and examples on the site. There have been numerous complaints about the unavailability of those jsp's, as the APIs have been documented for years and are widely referenced from a number of sites. The current status is that the source code for all of the jsp's has been migrated to one of our public Github repositories:

github.com/unicode-org/unicodetools

The jsp's are buildable there, and are "shovel ready to plug into the Google Cloud Platform." The infrastructure committee is working on this, to finish the steps needed to deploy. Once that deployment occurs, it should again be possible to call the jsp's from a web page.

Recovery of two major Subversion repositories used by the Editorial Committee and for Unicode releases that crashed and burned during the Great VM Crash of 2020 is still underway. Most of the "unicodetools" repository has been migrated to Github (see above). The "draft" repository has been split into several new repositories, in part because of the enormous size of the image collection in it associated with emoji. The portion of the old Subversion repository devoted to many of the Unicode technical report specifications, managed by the Editorial Committee, has now been migrated to a unicode-reports repository in GitHub.

The Editorial Committee has documented its use of unicode-reports and has a general how-to page for GitHub to help the editors working with the new repository. (These pages were developed by Peter Constable.) The pages are not public, but are available to all the editors for their work on the Editorial Committee.

C. Process Issues

C1. Editorial Committee Input on Process to Update Reports

For Discussion by UTC: The Editorial Committee has discussed the issue of whether it makes sense to proceed with fixing any obvious typos or similar errors reported in a UTS or UAX, without going through the full cycle of issuing a proposed update of the document for public review prior to the publication of the next version of the specification. Note that we already use this streamlined process to update a UTS or UAX for a new Unicode version when they have no content updates pending, other than the pro forma updates for version and revision numbers and copyright date. It would streamline our process a bit more if reports of obvious, small editorial defects could be handled in a similar manner when there is no substantive technical implication for the update. Should we proceed this way for future versions?

D. UTR Topics

D1. UAX #38, Unicode Han Database (Unihan)

FYI: The Editorial Committee reviewed in detail several additions to the text of the proposed update of UAX #38, with significant updates of the descriptions of the kPhonetic and kMeyerWempe fields in Unihan. The live text of UAX #38 available for public review for PRI #421 has been refreshed with this new text.

EC-UTC164-R2: The Editorial Committee recommends that Rick be given an action item to extend the close date for PRI #421, for consideration at the next UTC meeting.

AI, Rick. Extend the close date for PRI #421 to September NN, 2020.

E. PRI Topics

E1. Editorial Feedback on open PRI's for documents

FYI: The Editorial Committee checked for any editorial feedback received for the following open PRIs: PRI #420 (UAX #45), PRI #419 (UAX #44), PRI #417 (UAX #29), PRI #416 (UAX #14), PRI #415 (UTR #23). There hasn't been any feedback to date, and the Editorial Committee is not currently offering any other editorial feedback of its own.

EC-UTC164-R3: The Editorial Committee recommends that Rick be given an action item to extend the close date each of these PRIs, for consideration at the next UTC meeting.

AI, Rick. Extend the close dates for PRI #420, PRI #419, PRI #417, PRI #416, and PRI #415 to September NN, 2020.

F. Responses to Public Feedback

FYI: The Editorial Committee has reviewed the general public feedback routed for its consideration in the UTC #164 Comments on Public Review Issues document: L2/20-174. The exact text of all that editorial feedback can be referred to in L2/20-174. The short summaries below simply reference the authors and dates of the feedback, giving any relevant conclusions from the discussion. The suggested action items are queued up below the discussion section.

Discussion:

David Corbett (May 5): This reports a lack of clarity about the representation of code point labels in Section 4.8 of the core specification. The Editorial Committee's evaluation is that the report makes sense and the text should be updated.

Yoshidumi (May 29): This reports a typo in UTR #25, which should be forwarded to the editor of UTR #25 for correction.

Norbert Lindenberg (June 25): This report requests clarification of the terms "nukta", "bindu", and "svara" in the core specification and the Unicode Glossary. The Editorial Committee concurs that some clarification is in order. However, there is some question as to whether there would be technical issues in trying to nail down definitions for these, and cautions that the properties and algorithms group should be involved in reviewing any update that might impact the interpretation of categories used in IndicSyllabicCategories.txt.

Note that during this discussion, and in response to some other email feedback, the Editorial Committee felt that a clarification of text regarding the use of U+0300 and U+0301 in Devanagari is also in order.

Dirkjan Ochtman (July 8): Thus report notes an obvious typo in an example in UTS #46. The Editorial Committee concurs and thinks it should be sent to the editors of UTS #46 for correction.

Stanislav Goldstein (July 8): This report notes inconsistencies in the discussion of allocation areas in Sections 2.8 and 2.9 of the core specification. The Editorial Committee concurs, and suggests that the inconsistencies (resulting from additions in Unicode 13.0) should be corrected. Note that this will also involve updating figures in the text.

Paul Hardy (July 10): Ben Yang and Ken Whistler followed up with Paul Hardy and John Cowan on the details of this error report, and the mapping table for ISO-IR-68 (APL) on the web site has already been corrected.

Suggested action items:

EC-UTC164-R4: The Editorial Committee recommends recording the following action items.

AI, Ken Whistler, Ed Committee. Clarify the text in Section 4.8 regarding the representation of code point labels for Unicode 14.0. Ref. David Corbett, May 5, in L2/20-174.

AI, Rick. Forward the feedback of Yoshidumi, May 9, in L2/20-174, to the attention of the editors of UTR #25.

AI, Ken Whistler, Ed Committee. Investigate options for improvement and consistency of text re nukta, bindu, and svara in the core specification, the Unicode Glossary, and the header of IndicSyllabicCategory.txt. Ref. Norbert Lindenberg, June 25, in L2/20-174.

AI, Liang Hai, Ed Committee. Investigate whether clarification of the text in the core specification regarding the use of U+0300 and U+0301 in Devanagari is in order. For Unicode 14.0.

AI, Mark Davis, Ed Committee. Correct the example in the third row in Section 4.5 of UTS #46 to read "bloß.de", for Unicode 14.0. Ref. Dirkjan Ochtman, July 8, L2/20-174.

AI, Ken Whistler, Ed Committee. Update the discussion and figures for allocation areas in Sections 2.8 and 2.9 of the core specification to reflect recent additions to the standard. For Unicode 14.0. Ref Stanislav Goldstein, July 8, L2/20-174.

G. Miscellaneous Topics

G1. (None noted)