[Unicode]  Unicode 7.0.0 Home | Site Map | Search
 

Unicode 7.0.0 DRAFT

This page summarizes the important changes in preparation for the Unicode Standard, Version 7.0.0. This version will supersede all previous versions of the Unicode Standard.

Unicode 7.0.0 is in preparation. Some links on this page may be disabled, and others may not yet be active.
A. Summary
B. Version Information
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards

A. Summary

Unicode 7.0 adds a total of 2,834 characters, encompassing 23 new scripts and 32 new blocks, as well as character additions to 36 existing blocks in both Plane 0 and Plane 1 of the Unicode codespace. Notable additions include the following:

  • Significant reorganization of the chapters and layout of the core specification, and a new form factor tailored for easy viewing on e-readers and other mobile devices
  • The newly adopted currency symbol for the ruble sign in support of Russia’s new currency sign
  • Pictographic symbols (including many emoji), geometric symbols, arrows, and ornaments originating from the Wingdings and Webdings sets
  • Twenty-three new lesser-used and historic scripts extending support for written languages of North America, China, India, other Asian countries, and Africa
  • Updated collation and IDNA compatibility processing
  • Alignment of the core specification with updates to the Unicode Bidirectional Algorithm
  • Further clarification of the case pair stability policy, and a new stability policy for Numeric_Type

Synchronization

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 7.0:

This version of the Unicode Standard is synchronized with ISO/IEC 10646:2012, plus Amendments 1 and 2. Additionally, it includes the accelerated publication of U+20BD RUBLE SIGN.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

B. Version Information

Version 7.0 of the Unicode Standard consists of the core specification, the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Version 7.0.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 7.0.0, (Mountain View, CA: The Unicode Consortium, 2014. ISBN 978-1-936213-09-2)
http://www.unicode.org/versions/Unicode7.0.0/

The terms “Version 7.0” or “Unicode 7.0” are abbreviations for the full version reference, Version 7.0.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium. The Unicode Standard.
http://www.unicode.org/versions/latest/

A complete specification of the contributory files for Unicode 7.0 is found on the page Components for 7.0.0. That page also provides the recommended reference format for Unicode Standard Annexes.

The navigation bar on the left of this page provides links to both the core specification as a single file, as well as to individual chapters, and the appendices. Also provided are links to the code charts, the radical-stroke indices to CJK ideographs, the Unicode Standard Annexes and the data files for Version 7.0 of the Unicode Character Database.

Code Charts

Several sets of code charts are available. They serve different purposes:

  • The latest set of code charts for the Unicode Standard are available online. Those charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online index by character name is also provided.

For Unicode 7.0.0 in particular two additional sets of code chart pages are provided:

  • A set of delta code charts showing the blocks in which bidirectional format controls were added for Unicode 7.0.0. Those characters are visually highlighted in the relevant chart. These delta code charts also include blocks which contain significant glyph changes to fix errata.
  • A set of archival code charts that represent the entire set of characters, names and representative glyphs at the time of publication of Unicode 7.0.0.

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Errata

Errata incorporated into Unicode 7.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 7.0, see the list of current Updates and Errata.

C. Stability Policy Update

  • The case pair stability policy has been augmented with further clarification.
  • A property value stability policy has been added for Numeric_Type=Digit.

D. Textual Changes and Character Additions

[text TBD]

Changes in the Unicode Standard Annexes are listed in Section G.

Character Assignment Overview

327 characters have been added to the BMP, while 2,507 characters have been added in the supplementary planes. Most character additions are in new blocks, but there are also character additions to a number of existing blocks.

New Blocks

The newly-defined blocks in Version 7.0 are:

Range
Block Name
1AB0..1AFF
Combining Diacritical Marks Extended
A9E0..A9FF
Myanmar Extended-B
AB30..AB6F
Latin Extended-E
102E0..102FF
Coptic Epact Numbers
10350..1037F
Old Permic
10500..1052F
Elbasan
10530..1056F
Caucasian Albanian
10600..1077F
Linear A
10860..1087F
Palmyrene
10880..108AF
Nabataean
10A80..10A9F
Old North Arabian
10AC0..10AFF
Manichaean
10B80..10BAF
Psalter Pahlavi
11150..1117F
Mahajani
111E0..111FF
Sinhala Archaic Numbers
11200..1124F
Khojki
112B0..112FF
Khudawadi
11300..1137F
Grantha
11480..114DF
Tirhuta
11580..115FF
Siddham
11600..1165F
Modi
118A0..118FF
Warang Citi
11AC0..11AFF
Pau Cin Hau
16A40..16A6F
Mro
16AD0..16AFF
Bassa Vah
16B00..16B8F
Pahawh Hmong
1BC00..1BC9F
Duployan
1BCA0..1BCAF
Shorthand Format Controls
1E800..1E8DF
Mende Kikakui
1F650..1F67F
Ornamental Dingbats
1F780..1F7FF
Geometric Shapes Extended
1F800..1F8FF
Supplemental Arrows-C

E. Conformance Changes

  • Minor changes were made to reflect updates to the Bidirectional Algorithm in Version 6.3 of the Unicode Standard.
  • Corrigendum #9 was applied to D14 (Noncharacter).
  • The changes from Version 6.3 of the Unicode Standard were incorporated in D136 (Case-ignorable) in the updated core specification.

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 7.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations.

Other updates include changes to the derivations of the Alphabetic and Case_Ignorable properties, and a number of updates to the Script and Script_Extensions property assignments. Also, the conventions for defining default property values for ranges of code points using “@missing” directives was regularized.

G. Changes in the Unicode Standard Annexes

In Version 7.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex Changes
UAX #9
Unicode Bidirectional Algorithm
TBD
UAX #11
East Asian Width
No significant changes in this version.
UAX #14
Unicode Line Breaking Algorithm
No significant changes in this version.
UAX #15
Unicode Normalization Forms
Corrected note for Table 3, Notational Conventions.
UAX #24
Unicode Script Property
No significant changes in this version.
UAX #29
Unicode Text Segmentation
Added U+AA7D MYANMAR SIGN TAI LAING TONE-5 to the exception list for SpacingMark in Table 2, Grapheme_Cluster_Break Property Values. The exception list for SpacingMark has been updated and excludes specific characters from being assigned the Grapheme_Cluster_Break property value SpacingMark by default.
UAX #31
Unicode Identifier and Pattern Syntax
Added many new scripts to Table 4, Candidate Characters for Exclusion from Identifiers. The text on natural-language identifiers was changed to have a stronger recommendation for including the exception characters, and include the Catalan MIDDLE DOT.
UAX #34
Unicode Named Character Sequences
Added definitions for Unicode namespace and the Unicode namespace for character names. Major rewrite of Section 4, Names.
UAX #38
Unicode Han Database (Unihan)
The syntax for the kIICore field has been changed. The kCompatibilityVariant and kRSUnicode fields have been moved to Unihan_IRGSources.txt.
UAX #41
Common References for Unicode Standard Annexes
No significant changes in this version.
UAX #42
Unicode Character Database in XML
Added the value 7.0 for the age attribute, and new values for the attributes blk, jg, sc, and InSC.
UAX #44
Unicode Character Database
Updated the derivation of the Alphabetic property and of the Case_Ignorable property. Simplified the discussion of @missing in Section 4.2.10 @missing Conventions, to reflect the revised conventions in the UCD data files, which eliminated special edge cases. Corrected statement about aliases for provisional properties in Section 5.8 Property and Property Value Aliases.
UAX #45
U-Source Ideographs
Clarified meaning of status field.

H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard Changes
UTS #10
Unicode Collation Algorithm
Changed the text to discuss collation weights more generically, with fewer references to the 16-bit weights used in the DUCET, and Section 6.3.2, Large Values for Secondary or Tertiary Weights was merged into Section 6.2, Large Weight Values.
UTS #46
Unicode IDNA Compatibility Processing
Updated statistics for 7.0.0 in Table 4, IDNA Comparisons. Section 4 has been modified to clarify the input and results for each major step in the algorithm.