L2/18-220 Title: Unicode 12.1 Planning Considerations Source: Ken Whistler Date: July 16, 2018 Background The UTC is now working on the Unicode 12.0 release cycle. The tentative release date that I and the Editorial Committee are working towards for that release is March 5, 2019. The date of that release, which cannot really be moved, given the complex dependencies now in place for the corresponding CLDR and ICU releases, and for the vendor product cycles that depend, in turn, on those, poses a problem for the anticipated announcement of the new Japanese era name. The date of the abdication and start of the subsequent Japanese reign era is now fixed, but the actual name of the era will not be announced, apparently, until sometime shortly after February 24, 2019. That timeframe is way too short to adjust the data files and charts for the addition of a new character, no matter how urgent it is for implementation. The problem, in this case, is that even though we know the code point for this new character, U+32FF, which the UTC set aside back in January, we cannot know the actual content of that code point until the era name itself is announced. The characters encoded for these calendrical symbols in Unicode have compatibility decompositions, and those decompositions depend on the actual name chosen for the era. Because the decomposition, once assigned, is immutable, involving Unicode normalization, the UTC cannot afford to make any mistakes here, nor can it just *guess* and release the code point early. All of this is pointing directly to the necessity of issuing a Unicode 12.1 release sharply on the heels of Unicode 12.0, incorporating the addition of the new Japanese era name character, which all vendors will be under great pressure to immediately support in 2019 software releases. The problem, however, is that Unicode releases have become very large and resource intensive, and the staff needed to accomplish them simply will not have the cycles available to run a full, business-as-normal, Unicode release, just to get this one character available quickly. Instead, the UTC is inevitably going to have to think outside the box a bit about this, to figure out how to recast a Unicode minor release into a framework that will be light and quick enough to meet the requirements. Planning for Unicode 12.1 The Editorial Committee has had some preliminary discussion about all the staffing and resource issues, in the context of preparing for what seems like the inevitable 12.1 release. I'll list some of the planning considerations here, so the UTC can start the discussion about how best to deal with the issues involved. 1. A Unicode 12.1 will have to be sharply limited in scope. a. There cannot be a revision of the core specification. b. There cannot be a cycling of all 14 annexes or of the 4 UTSes also synched to Unicode versions. c. There cannot be a full cycle of chart updates. d. In other words, it simply is not feasible to treat a 12.1 minor release in the same way we did for the Unicode 6.2 or Unicode 6.3 minor releases, cycling all annexes, all charts, etc., for a small number of character additions. 2. The changes for the UCD 12.1 should be strongly constrained. a. For a quick turnaround, there is no chance for a full beta review cycle for the UCD. b. Dependencies between property changes slow all UCD processing and testing down, and increase risks. c. Ideally, a Unicode 12.1 UCD should then reflect just the addition of U+32FF, plus the implications of its properties (including its compatibility decomposition) -- and *nothing* else. 3. The public documentation for Unicode 12.1 must be distinct. a. We have a long-established pattern for Unicode release pages, but just applying that template would tend to mislead people about the stripped down nature of Unicode 12.1. b. A 12.1 release page should look more like a corrigendum -- a short statement that shows the new character, its code point, name, and glyph -- although formally it should not be an actual corrigendum. c. Version documentation will also need to be tweaked a bit, to allow for a minor release consisting of just a simple character addition (and UCD update), while inheriting all of the specifications for the last major release. 4. Chart support for Unicode 12.1 must be limited. a. It should be feasible to do a one-off chart update for the U+32xx block. That would make the current chart listings accurate. b. But it will likely not be feasible to run a full Unicode 12.1 archival chart cycle, in part because of all the supporting steps and documentation required for that. 5. Preparation for a rapid Unicode 12.1 turnaround will be required. a. The Editorial Committee will need to have the draft documentation in place, *in advance* of the actual Unicode 12.0 release. That will minimize the time needed to get to release following the announcement of the new era name. b. A UCD 12.1 "alpha" should also be pre-positioned, using dummy values for U+32FF, so any issues can be sorted out ahead of time. Then the final version data files can be quickly generated with the actual decomposition and name put in place. c. A communication strategy also needs to be in place, with announcement text, tweets, etc., all planned well in advance of the actual 12.1 data release. 6. UTS #10 and UCA consideration a. The repertoire for DUCET is also aligned in each release. The UTC will need to decide whether UCA 12.1 can be skipped, or whether a lightweight release process for UCA also needs to be designed. 7. UTS #46 and IDNA consideration a. The same issue applies for UTS #46, which contains data tables synched to each Unicode release. Can that data update be dispensed with for Unicode 12.1, or will a lightweight update of the IdnaMappingTable.txt and test file need to be created for 12.1? 8. UTS #39 consideration a. The same issue applies for UTS #39, which contains data tables synched to each Unicode release. Can we do a lightweight update of IdentifierStatus.txt and IdentifierType.txt, or should we dispense with a 12.1 update of those files? 9. UTS #51 emoji consideration a. Although the emoji data files are now also version-synched to the Unicode release, much of the data would not be impacted by a single era name character addition. However, the UTC will need to decide what to do about the property values in emoji-data.txt which apply to all characters. And that decision will depend, in turn, on how the emoji properties have been handled for Unicode 12.0. This is a lot to chew on, but I think the UTC will be far better off going into this will a specific plan in hand, rather than just reacting in crisis mode once the new era name is announced next year right around the time of the Unicode 12.0 release. Ideally, we should get clear direction from the UTC about planning for a 12.1 release, no later than the end of the September UTC meeting this year, which will also be the meeting that officially authorizes the Unicode 12.0 beta review cycle.