L2/18-220

Title: Unicode 12.1 Planning Considerations

Source: Ken Whistler

Date: July 16, 2018


Background

The UTC is now working on the Unicode 12.0 release cycle. The tentative
release date that I and the Editorial Committee are working towards for
that release is March 5, 2019.

The date of that release, which cannot really be moved, given the complex
dependencies now in place for the corresponding CLDR and ICU releases,
and for the vendor product cycles that depend, in turn, on those, poses
a problem for the anticipated announcement of the new Japanese era name.
The date of the abdication and start of the subsequent Japanese reign
era is now fixed, but the actual name of the era will not be announced,
apparently, until sometime shortly after February 24, 2019. That timeframe
is way too short to adjust the data files and charts for the addition
of a new character, no matter how urgent it is for implementation.

The problem, in this case, is that even though we know the code point
for this new character, U+32FF, which the UTC set aside back in January,
we cannot know the actual content of that code point until the era
name itself is announced. The characters encoded for these calendrical
symbols in Unicode have compatibility decompositions, and those decompositions
depend on the actual name chosen for the era. Because the decomposition,
once assigned, is immutable, involving Unicode normalization, the UTC
cannot afford to make any mistakes here, nor can it just *guess* and
release the code point early.

All of this is pointing directly to the necessity of issuing a Unicode 12.1
release sharply on the heels of Unicode 12.0, incorporating the addition
of the new Japanese era name character, which all vendors will be under
great pressure to immediately support in 2019 software releases.

The problem, however, is that Unicode releases have become very large
and resource intensive, and the staff needed to accomplish them simply
will not have the cycles available to run a full, business-as-normal,
Unicode release, just to get this one character available quickly.
Instead, the UTC is inevitably going to have to think outside the box
a bit about this, to figure out how to recast a Unicode minor release
into a framework that will be light and quick enough to meet the
requirements.

Planning for Unicode 12.1

The Editorial Committee has had some preliminary discussion about all
the staffing and resource issues, in the context of preparing for what
seems like the inevitable 12.1 release. I'll list some of the planning
considerations here, so the UTC can start the discussion about how
best to deal with the issues involved.

1. A Unicode 12.1 will have to be sharply limited in scope.

   a. There cannot be a revision of the core specification.

   b. There cannot be a cycling of all 14 annexes or of the
      4 UTSes also synched to Unicode versions.

   c. There cannot be a full cycle of chart updates.

   d. In other words, it simply is not feasible to treat a 12.1
      minor release in the same way we did for the Unicode 6.2
      or Unicode 6.3 minor releases, cycling all annexes, all
      charts, etc., for a small number of character additions.

2. The changes for the UCD 12.1 should be strongly constrained.

   a. For a quick turnaround, there is no chance for a full beta
      review cycle for the UCD.

   b. Dependencies between property changes slow all UCD processing
      and testing down, and increase risks.

   c. Ideally, a Unicode 12.1 UCD should then reflect just
      the addition of U+32FF, plus the implications of its
      properties (including its compatibility decomposition) --
      and *nothing* else.

3. The public documentation for Unicode 12.1 must be distinct.

   a. We have a long-established pattern for Unicode release pages,
      but just applying that template would tend to mislead people
      about the stripped down nature of Unicode 12.1.

   b. A 12.1 release page should look more like a corrigendum --
      a short statement that shows the new character, its code
      point, name, and glyph -- although formally it should not
      be an actual corrigendum.

   c. Version documentation will also need to be tweaked a bit, to
      allow for a minor release consisting of just a simple
      character addition (and UCD update), while inheriting
      all of the specifications for the last major release.

4. Chart support for Unicode 12.1 must be limited.

   a. It should be feasible to do a one-off chart update for
      the U+32xx block. That would make the current chart
      listings accurate.

   b. But it will likely not be feasible to run a full
      Unicode 12.1 archival chart cycle, in part because of
      all the supporting steps and documentation required for
      that.

5. Preparation for a rapid Unicode 12.1 turnaround will be required.

   a. The Editorial Committee will need to have the draft
      documentation in place, *in advance* of the actual
      Unicode 12.0 release. That will minimize the time needed
      to get to release following the announcement of the new
      era name.

   b. A UCD 12.1 "alpha" should also be pre-positioned, using
      dummy values for U+32FF, so any issues can be sorted out
      ahead of time. Then the final version data files can be
      quickly generated with the actual decomposition and name
      put in place.

   c. A communication strategy also needs to be in place, with
      announcement text, tweets, etc., all planned well in
      advance of the actual 12.1 data release.

6. UTS #10 and UCA consideration

   a. The repertoire for DUCET is also aligned in each release.
      The UTC will need to decide whether UCA 12.1 can be
      skipped, or whether a lightweight release process for
      UCA also needs to be designed.

7. UTS #46 and IDNA consideration

   a. The same issue applies for UTS #46, which contains data
      tables synched to each Unicode release. Can that data update
      be dispensed with for Unicode 12.1, or will a lightweight
      update of the IdnaMappingTable.txt and test file need to
      be created for 12.1?

8. UTS #39 consideration

   a. The same issue applies for UTS #39, which contains data
      tables synched to each Unicode release. Can we do a lightweight
      update of IdentifierStatus.txt and IdentifierType.txt, or
      should we dispense with a 12.1 update of those files?

9. UTS #51 emoji consideration

   a. Although the emoji data files are now also version-synched
      to the Unicode release, much of the data would not be
      impacted by a single era name character addition. However,
      the UTC will need to decide what to do about the property
      values in emoji-data.txt which apply to all characters. And
      that decision will depend, in turn, on how the emoji properties
      have been handled for Unicode 12.0.

This is a lot to chew on, but I think the UTC will be far better
off going into this will a specific plan in hand, rather than just
reacting in crisis mode once the new era name is announced next
year right around the time of the Unicode 12.0 release.

Ideally, we should get clear direction from the UTC about planning
for a 12.1 release, no later than the end of the September UTC
meeting this year, which will also be the meeting that officially
authorizes the Unicode 12.0 beta review cycle.