[Unicode] Unicode 16.0.0 Tech Site | Site Map | Search
 

Unicode® 16.0.0 (DRAFT)

2024 September 10 (Announcement)

STATUS: This is a preliminary draft page for an upcoming release. Some details may be missing or incorrect, and some links may be wrong or broken. During the alpha review period, errors are expected and feedback is not necessary. During the beta review period, feedback on errors will be helpful and appreciated.

This page summarizes the important changes for the Unicode Standard, Version 16.0.0. This version supersedes all previous versions of the Unicode Standard.

A. Summary

Unicode 16.0 adds 5187 characters, for a total of 155,000 characters.

There are several significant themes for this release of the Unicode Standard.

  • There has been a very substantial addition to the repertoire of Egyptian hieroglyphs. Many of these additions are to cover the Ptolemaic period.
  • TBD

Synchronization

Several other important Unicode specifications have been updated for Version 16.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 16.0:

Specification Scope Data Files
UTS #10, Unicode Collation Algorithm Sorting Unicode text UCA data
UTS #39, Unicode Security Mechanisms Reducing Unicode spoofing Security data
UTS #46, Unicode IDNA Compatibility Processing Compatible processing of non-ASCII URLs IDNA data
IDNA 2008 derived data
UTS #51, Unicode Emoji Emoji and their behavior Emoji data

Some of the changes in Version 16.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

See the following resource links for general information about Unicode versions and other information about the Unicode Standard and other publications of the Unicode Consortium.

B. Technical Overview

Version 16.0 of the Unicode Standard consists of:

  • The core specification
  • The code charts (delta and archival) for this version
  • The Unicode Standard Annexes
  • The Unicode Character Database (UCD)

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Core Specification

STATUS: During the alpha review period, placeholder pages for the core specification are deployed without content. This is merely to exercise deployment and navigation. New content for the core specification will be available during the beta review period. In the meantime, reviewers are welcome to provide comments on the currently published version of the core specification (Version 15.0, 15MB), which can also be browsed on a per-chapter basis from the Unicode 15.1 landing page.

The core specification for Version 16.0 is available for browsing online as per-chapter web pages. An archival core specification will also be available as a single pdf. (14 MB)

Code Charts

Several sets of code charts are available. They serve different purposes:

Chart Type Description
Latest Code Charts These charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online index by character name is also provided.
Delta Code Charts These charts show the new blocks and any blocks in which characters were added specifically for Unicode 16.0.0. The new characters and any major updates to the representative glyphs are visually highlighted in these charts.
Archival Code Charts These charts contain the entire set of characters, names and representative glyphs at the time of publication of Unicode 16.0.0.

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Han Radical-Stroke Indices

There are a number of radical-stroke indices available to assist in the lookup of Han characters in the code charts.

Index Type Description
Interactive An interactive CJK character lookup page that supports lookup either by code point or by radical and stroke values.
IICore (3.8 MB) A static pdf radical-stroke index limited to just the repertoire of the IICore set. (This RS index page is seldom updated.)
Unihan Core 2020 (8.2 MB) A static pdf radical-stroke index limited to just the Unihan Core 2020 set. (This RS index page is seldom updated.)
Full (43 MB) A static pdf radical-stroke index that covers the entire unicode CJK ideograph repertoire for Unicode 16.0.

The full radical-stroke index is a stable part of this release of the Unicode Standard. It will never be updated.

Unicode Standard Annexes

STATUS: During the alpha review and beta review periods, links to individual UAXes (or UTSes) point to the proposed update for that document, if any. If no proposed update has been posted for the document, links point to the last published version of the document, for reference.

Links to the individual Unicode Standard Annexes for this version are available in Section I, List of Components below. The summary list of significant changes in the content of each Unicode Standard Annex for Version 16.0 can be found in Section G, Changes in the Unicode Standard Annexes below.

Unicode Character Database

STATUS: During the alpha review period, some of the data files may not be posted. Later, during beta review, a complete set of consistent data files will be posted, including data files associated with various Unicode Technical Standards.

Data files for Version 16.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories. Detailed documentation about the data files can be found in UAX #44, Unicode Character Database. Zipped versions of the UCD for bulk download are available, as well.

Version References

Version 16.0.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 16.0.0, (South San Francisco: The Unicode Consortium, 2024. ISBN 978-1-936213-34-4)
https://www.unicode.org/versions/Unicode16.0.0/

The terms “Version 16.0” or “Unicode 16.0” are abbreviations for the full version reference, Version 16.0.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium. The Unicode Standard.
https://www.unicode.org/versions/latest/

A complete specification of the contributory files for Unicode 16.0 is found below in Section I, List of Components. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.

Errata

Errata incorporated into Unicode 16.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 16.0, see the list of current Updates and Errata.

C. Stability Policy Update

TBD

D. Textual Changes and Character Additions

Changes in the Unicode Standard Annexes are listed in Section G.

Character Assignment Overview

5187 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see the delta code charts.

New Blocks

There are nine newly-defined blocks in Version 16.0:

Range Block Name
105C0..105FF Todhri
10D40..10D8F Garay
11380..113FF Tulu-Tigalari
116D0..116FF Myanmar Extended-C
11BC0..11BFF Sunuwar
13460..1355F Egyptian Hieroglyphs Extended-A
16100..1613F Gurung Khema
16D40..16D7F Kirat Rai
1CC00..1CEBF Symbols for Legacy Computing Supplement
1E5D0..1E5FF Ol Onal

E. Conformance Changes

There are no new conformance requirements for the core specification in Unicode 16.0.

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 16.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M.

G. Changes in the Unicode Standard Annexes

In Version 16.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex Changes
UAX #9
Unicode Bidirectional Algorithm
No significant changes in this version.
UAX #11
East Asian Width
No significant changes in this version.
UAX #14
Unicode Line Breaking Algorithm
The description of line break class AS was updated to mention that all digits of scripts that use the Brahmic style of line breaking are assigned this class. The documentation of plan 1 ranges defaulting to lb=ID was updated. Most of Section 5.2 was moved to the core specification. The text of LB9 was clarified. The regular expressions in LB28a were clarified.
UAX #15
Unicode Normalization Forms
No significant changes in this version.
UAX #24
Unicode Script Property
No significant changes in this version.
UAX #29
Unicode Text Segmentation
The definition of GCB=V was updated to include Kirat Rai vowels. The description of rules GB6 - GB8 was updated.
UAX #31
Unicode Identifiers and Syntax
A clarification was added that NFD must be applied before toNFKC_Casefold in order to correctly meet requirements UAX31-R4 and UAX-R5 with NFKC and full case folding. A reference to definition D147 of the Unicode Standard was added.
UAX #34
Unicode Named Character Sequences
No significant changes in this version.
UAX #38
Unicode Han Database (Unihan)
The sorting algorithm examples have been updated. The relationship between the Equivalent_Unified_Ideograph property and the Unihan database was clarified. A reference to the new RSIndex.txt data file was added. A new delimiter was added (semicolon). The delimiter of the kAccountingNumeric property was updated. The delimiter and syntax of the kDefinition property were changed. A description has been added for the new kFanqie property. The kFrequency property was removed. The syntax and description of the kPhonetic property were updated. The description of the kPrimaryNumeric property was updated.
UAX #41
Common References for Unicode Standard Annexes
All references were updated for Unicode 16.0.
UAX #42
Unicode Character Database in XML
New code point attributes, values, and patterns were added for Unicode 16.0.
UAX #44
Unicode Character Database
The documentation was updated to describe the changes to the UCD for Version 16.0. Documentation was added for the new property Modifier_Combining_Mark. A clarification was added regarding the derivation of Numeric_Value from various Unihan properties.
UAX #45
U-Source Ideographs
No significant changes in this version.
UAX #50
Unicode Vertical Text Layout
Section 3.2.4 and Table 2 were added to explain the tailoring of fullwidth quotation marks.
UAX #53
Unicode Arabic Mark Rendering
This specification was changed from a UTR to a UAX for Unicode 16.0. The image for Example 3 was corrected. An implementation note was added after the description fo the algorithm. A section was added for U+10EFC ARABIC COMBINING ALEF OVERLAY.
UAX #57
Unicode Egyptian Hieroglyph Database (Unikemet)
This UAX is new for Unicode 16.0, and describes the Unikemet.txt data file for the UCD.

H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard Changes
UTS #10
Unicode Collation Algorithm
Table 18 in appendix B was extended to include a CTT Name column. Text was added to Appendix B to enable ISO/IEC 14651 to refer to the CTT tables published (starting with Unicode 16.0) on the Unicode website. A note was added to Section 10.1.3, Implicit Weights, explaining how the CTT for ISO/IEC 14651 uses the implicit weight calculated in Table 16.
UTS #39
Unicode Security Mechanisms
The definitions of skeleton and confusable were updated.
UTS #46
Unicode IDNA Compatibility Processing
No significant changes in this version.
UTS #51
Unicode Emoji
All references were updated for Unicode 16.0.

I. List of Components

This section lists the components of Version 16.0.0 of the Unicode Standard. The version numbering and the role of each component are explained in Versions of The Unicode Standard.

Core Specification
UnicodeStandard-16.0.pdf (size: 14 MB)
Code Charts and Radical-Stroke Index
Code Charts (size: 110 MB)
Radical-Stroke Index (size: 44 MB)
Radical-Stroke Index data
Unicode Standard Annexes
UAX #9: Unicode Bidirectional Algorithm
UAX #11: East Asian Width
UAX #14: Unicode Line Breaking Algorithm
UAX #15: Unicode Normalization Forms
UAX #24: Unicode Script Property
UAX #29: Unicode Text Segmentation
UAX #31: Unicode Identifiers and Syntax
UAX #34: Unicode Named Character Sequences
UAX #38: Unicode Han Database (Unihan)
UAX #41: Common References for Unicode Standard Annexes
UAX #42: Unicode Character Database in XML
UAX #44: Unicode Character Database
UAX #45: U-Source Ideographs
UAX #50: Unicode Vertical Text Layout
UAX #53: Unicode Arabic Mark Rendering
UAX #57: Unicode Egyptian Hieroglyph Database (Unikemet)
Unicode Character Database
https://www.unicode.org/Public/16.0.0/
Documentation
Index.txt
NamesList.html
ReadMe.txt
Core Data
ArabicShaping.txt
BidiBrackets.txt
BidiMirroring.txt
Blocks.txt
CJKRadicals.txt
CompositionExclusions.txt
DoNotEmit.txt
EastAsianWidth.txt
EmojiSources.txt
EquivalentUnifiedIdeograph.txt
HangulSyllableType.txt
IndicPositionalCategory.txt
IndicSyllabicCategory.txt
Jamo.txt
LineBreak.txt
NameAliases.txt
NamedSequences.txt
NamedSequencesProv.txt
NamesList.txt
NormalizationCorrections.txt
NushuSources.txt
PropertyAliases.txt
PropertyValueAliases.txt
PropList.txt
Scripts.txt
ScriptExtensions.txt
SpecialCasing.txt
StandardizedVariants.txt
TangutSources.txt
UnicodeData.txt
Unikemet.txt
VerticalOrientation.txt
Unihan Database (Unihan.zip)
Unihan_DictionaryIndices.txt
Unihan_DictionaryLikeData.txt
Unihan_IRGSources.txt
Unihan_NumericValues.txt
Unihan_OtherMappings.txt
Unihan_RadicalStrokeCounts.txt
Unihan_Readings.txt
Unihan_Variants.txt
Data for UAX #45
USourceData.txt
USourceGlyphs.pdf
USourceRSChart.pdf
Derived Data
CaseFolding.txt
DerivedAge.txt
DerivedCoreProperties.txt
DerivedNormalizationProps.txt
Extracted Data
DerivedBidiClass.txt
DerivedBinaryProperties.txt
DerivedCombiningClass.txt
DerivedDecompositionType.txt
DerivedEastAsianWidth.txt
DerivedGeneralCategory.txt
DerivedJoiningGroup.txt
DerivedJoiningType.txt
DerivedLineBreak.txt
DerivedName.txt
DerivedNumericType.txt
DerivedNumericValues.txt
Conformance Test Data
BidiCharacterTest.txt
BidiTest.txt
NormalizationTest.txt
Auxiliary Data for UAX #14 and UAX #29
GraphemeBreakProperty.txt
GraphemeBreakTest.txt
LineBreakTest.txt
SentenceBreakProperty.txt
SentenceBreakTest.txt
WordBreakProperty.txt
WordBreakTest.txt
Documentation for Auxiliary Data
GraphemeBreakTest.html
LineBreakTest.html
SentenceBreakTest.html
WordBreakTest.html
Emoji Data
emoji-data.txt
emoji-variation-sequences.txt

M. Implications for Migration

There are a significant number of changes in Unicode 16.0 which may impact implementations upgrading to Version 16.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.

Core Specification Changes

The core specification has been completely revamped for Unicode 16.0.0. The text has all been converted to HTML, and has been deployed on a self-contained subsite. For alpha review, only the prototype subsite has been posted, to demonstrate the new structure and navigation between chapters. For beta review, all of the current content will also be rolled out and be available for review and comment. The text is no longer published in per-chapter pdf files, but prior bookmarked links into those chapter files will resolve correctly in the new per-chapter HTML files. An archival pdf version of the entire core specification will be produced for the release, and will look and behave very similarly to the corresponding archival pdf files for prior releases.

Script-related Changes

  • TBD

General Character Property Issues

  • Starting with U+11F5A KAWI SIGN NUKTA in Unicode 16.0, newly encoded nukta characters use Canonical_Combining_Class (ccc) 0 or positional ccc values such as 220 or 230. Nukta characters encoded in earlier versions typically, but not always, use ccc=7. Software that needs to identify nuktas in Brahmic scripts should check for Indic_Syllabic_Category=Nukta.

Segmentation

  • TBD

Numeric Property Issues

  • TBD

CJK/Unihan Changes

  • TBD

See UAX #38, Unicode Han Database (Unihan) for further details on these changes, especially Section 4.2, Listing by Date of Addition to the Unicode Standard, and Section 4.3, Listing by Location within Unihan.zip. UAX #38 also has updated regex values for numerous Unihan properties.

UTS #46 (IDNA) Changes

  • TBD

Changes to Code Charts

  • TBD

Collation-related Changes

  • TBD

Emoji Changes

TBD. For details, see the Unicode 16.0 emoji charts and Emoji Recently Added, v16.0.

 


Access to Copyright and terms of use