[Unicode]  Unicode 12.0.0 Home | Site Map | Search
 

Unicode® 12.0.0 (DRAFT)

2019 March 5 (Announcement)

This page summarizes the important changes for the Unicode Standard, Version 12.0.0. This version supersedes all previous versions of the Unicode Standard.

The Unicode Character Database, Code Charts, and Annexes for Version 12.0 will be released on March 5, 2019. The core specification (the PDF chapters) of Version 12.0 is still pending publication due to the extensive editorial work required for the new content additions. Until final publication, the links to individual chapters of the core specification will not be activated. An announcement will be made when the core specification for Version 12.0 is available. In the meantime, implementers can continue to reference the relevant sections of the most recent version of the core specification.
A. Summary
B. Technical Overview
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards
M. Implications for Migration

A. Summary

Unicode 12.0 adds 554 characters, for a total of 137,929 characters. These additions include 4 new scripts, for a total of 150 scripts, as well as 61 new emoji characters.

The new scripts and characters in Version 12.0 add support for lesser-used languages and unique written requirements worldwide. Funds from the Adopt-a-Character program provided support for some of these additions. The new scripts and characters include:

  • Elymaic, ...
  • Nandinagari, ...
  • Nyiakeng Puachue Hmong, ...
  • Wancho, ...
  • TBD, ...

Popular symbol additions:

  • 61 emoji characters, including ... For complete statistics regarding all emoji as of Unicode 12.0, see Emoji Counts. For more information about emoji additions for Unicode 12.0, including new emoji ZWJ sequences and emoji modifier sequences, see Emoji Recently Added, v12.0.
  • TBD, ...

Additional support for lesser-used languages and scholarly work was extended worldwide, including:   

  • TBD

Version 12.0 improved TBD...

Synchronization

Several other important Unicode specifications have been updated for Version 12.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 12.0:

Some of the changes in Version 12.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.

This version of the Unicode Standard is also synchronized with 10646:2017, fifth edition, plus Amendments 1 and 2 to the fifth edition, plus the following additions from the CD for the sixth edition:

  • 61 emoji characters

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

B. Technical Overview

Version 12.0 of the Unicode Standard consists of:

  • The core specification
  • The code charts (delta and archival) for this version
  • The Unicode Standard Annexes
  • The Unicode Character Database (UCD)

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Core Specification

The core specification is available as a single pdf for viewing. (NN.N MB) Links are also available in the navigation bar on the left of this page to access individual chapters and appendices of the core specification.

Code Charts

Several sets of code charts are available. They serve different purposes:

  • The latest set of code charts for the Unicode Standard is available online. Those charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online index by character name is also provided.

For Unicode 12.0.0 in particular two additional sets of code chart pages are provided:

  • A set of delta code charts showing the new blocks and any blocks in which characters were added for Unicode 12.0.0. The new characters are visually highlighted in the charts.
  • A set of archival code charts that represents the entire set of characters, names and representative glyphs at the time of publication of Unicode 12.0.0.

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Unicode Standard Annexes

Links to the individual Unicode Standard Annexes are available in the navigation bar on the left of this page. The list of significant changes in the content of the Unicode Standard Annexes for Version 12.0 can be found in Section G below.

Unicode Character Database

Data files for Version 12.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories. Zipped versions of the UCD for bulk download are available, as well.

Version References

Version 12.0.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 12.0.0, (Mountain View, CA: The Unicode Consortium, 2019. ISBN 978-1-936213-22-1)
http://www.unicode.org/versions/Unicode12.0.0/

The terms “Version 12.0” or “Unicode 12.0” are abbreviations for the full version reference, Version 12.0.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium. The Unicode Standard.
http://www.unicode.org/versions/latest/

A complete specification of the contributory files for Unicode 12.0 is found on the page Components for 12.0.0. That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.

Errata

Errata incorporated into Unicode 12.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 12.0, see the list of current Updates and Errata.

C. Stability Policy Update

There were no significant changes to the Stability Policy of the core specification between Unicode 11.0 and Unicode 12.0.

D. Textual Changes and Character Additions

Four new scripts were added with accompanying new block descriptions:

Script Number of
Characters
Elymaic 23
Nandinagari 65
Nyiakeng Puachue Hmong 71
Wancho 59

Changes in the Unicode Standard Annexes are listed in Section G.

Character Assignment Overview

554 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see delta code charts.

E. Conformance Changes

There are no significant new conformance requirements in Unicode 12.0.

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 12.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M.

G. Changes in the Unicode Standard Annexes

In Version 12.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex Changes
UAX #9
Unicode Bidirectional Algorithm
Text was added in BD2 to guarantee that max_depth can be treated as a constant (with value 125).
UAX #11
East Asian Width
No significant changes in this version.
UAX #14
Unicode Line Breaking Algorithm
The behavior of NNBSP was clarified for Mongolian.
UAX #15
Unicode Normalization Forms
No significant changes in this version.
UAX #24
Unicode Script Property
No significant changes in this version.
UAX #29
Unicode Text Segmentation
The derivation of Lower and Upper for Sentence_Break was updated for Georgian, to account for the difference in how casing in Georgian interacts with sentence boundaries.
UAX #31
Unicode Identifier and Pattern Syntax
The context specified for A2 was tightened up, by requiring $Letter at the end of the sequence.
UAX #34
Unicode Named Character Sequences
The occurrence of initial hyphen-minus in Unicode character names was clarified.
UAX #38
Unicode Han Database (Unihan)
The syntax and/or descriptions for several Unihan data fields were significantly updated: kIRG_GSource, kIRG_JSource, kIRG_KSource and kIRG_TSource. The discussion of kDefaultSortKey was removed, and instead a description of the actual sorting algorithm used to generate the radical-stroke charts was added.
UAX #41
Common References for Unicode Standard Annexes
Updated all references for Unicode 12.0.
UAX #42
Unicode Character Database in XML
Added new code point attributes, values, and patterns.
UAX #44
Unicode Character Database
No significant changes in this version.
UAX #45
U-Source Ideographs
Documentation was added regarding the addition of a new comments field to the data file, USourceData.txt. Numerous new entries have also been added to that data file.
UAX #50
Unicode Vertical Text Layout
No significant changes in this version.

H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard Changes
UTS #10
Unicode Collation Algorithm
No significant changes in this version.
UTS #39
Unicode Security Mechanisms
The discussion of simplified versus traditional CJK characters as part of the enhancements for spoof detection was removed, because any effective approach for that would need to be more sophisticated.
UTS #46
Unicode IDNA Compatibility Processing
No significant changes in this version.
UTS #51
Unicode Emoji
Several definitions were updated, and a new definition for "RGI Set" was added. A new section about marking gender in emoji input has been added, as well as numerous clarifications about multi-person groupings, emoji and text presentation selectors, and the significance of the word "FACE" in emoji names. The mechanisms for support of skin tone distinctions when using multi-person emoji are now more fully described.

M. Implications for Migration

There are a significant number of changes in Unicode 12.0 which may impact implementations which are upgrading to Version 12.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.

Script-related Changes

TBD

Casing Issues

TBD

Shaping Issues

TBD

Segmentation-related Changes

TBD

CJK/Unihan Changes

TBD

Standardized Variation Sequences

TBD

New Data Files Added to the UCD

TBD

Code Charts

TBD