BETA Unicode® 17.0.0
The next version of the Unicode Standard will be Version 17.0.0, planned for release on
September 9, 2025. This version updates several annexes and adds significant new repertoire.
A total of 4847 new characters are encoded.
A beta version of the 17.0.0 Unicode Character Database files is available for public review.
We strongly encourage implementers to review the summary description,
download the beta 17.0.0 Unicode Character Database files,
and test their programs with the new data, well before the end of the beta period. It is especially important
to review the Notable Issues for Beta Reviewers.
We encourage users to check the code charts carefully
to verify correctness of the new characters added to Unicode 17.0.0 and to ensure
that there are no regressions
in glyph shapes for previously encoded characters.
Unicode Standard Annexes (proposed updates)
If a link is not active for an annex, no proposed update is available for review. This
situation may occur when no significant change is planned for that annex for a particular
release.
UAX #9,
Unicode Bidirectional Algorithm |
UAX #11,
East Asian Width |
UAX #14,
Unicode Line Breaking Algorithm |
UAX #15,
Unicode Normalization Forms |
UAX #24,
Unicode Script Property |
UAX #29,
Unicode Text Segmentation |
UAX #31,
Unicode Identifiers and Syntax |
UAX #34,
Unicode Named Character Sequences |
UAX #38,
Unicode Han Database (Unihan) |
UAX #41,
Common References for Unicode Standard Annexes |
UAX #42,
Unicode Character Database in XML |
UAX #44,
Unicode Character Database |
UAX #45,
U-Source Ideographs |
UAX #50,
Unicode Vertical Text Layout |
UAX #53,
Unicode Arabic Mark Rendering |
UAX #57,
Unicode Egyptian Hieroglyph Database (Unikemet) |
Related Unicode Technical Standards (proposed updates)
In addition to the Unicode Standard proper, four other Unicode Technical
Standards have significant text and data file updates that are
correlated with the new additions for Unicode 17.0.0. Review of that text
and data is also encouraged during the beta review period.
Review and Feedback
For guidance on how to focus your review, see the section
Notable Issues for Beta Reviewers.
Any feedback should be
reported using the contact form.
Comments on the Unicode Standard Version 17.0.0
or the Unicode Character Database data files should refer to the beta review
Public Review Issue #526.
Comments on specific Version 17.0.0 UAXes and UTSes should refer to the respective
Public Review Issue Numbers
for each document, where available.
The comment period ends
July 1, 2025.
All substantive technical comments must have been received by that date for
consideration at the July UTC meeting. Editorial comments (typos,
etc.) may be still submitted after that date for consideration in the final
editorial work.
Note: All beta files may be updated, replaced, or
superseded by other files at any time. The beta files will be
discarded once Unicode 17.0.0 is final. It is inappropriate to cite
these files as other than a work in progress. No
products or implementations should be released based on the beta
UCD data files—use only the final, approved Version 17.0.0 data
files, expected on September 9, 2025.
The Unicode Consortium provides early access to updated versions of the data files
and text to give reviewers and developers as much time as possible to ensure a problem-free adoption of
Version 17.0.0.
The assignment of characters for Unicode 17.0.0 is
now stable. There will be no further
additions or modifications of code points and no further changes to character names.
Please do not submit feedback requesting changes to code points
or character names for Unicode 17.0.0, as such feedback is not actionable.
One of the main purposes of the beta review period is to verify and
correct the preliminary character property assignments in the Unicode Character
Database. Reviewers should check for property changes to existing Unicode 16.0.0
characters, as well as the property values for the new Unicode 17.0.0 character
additions. The Auxiliary
HTML charts include the new characters highlighted in yellow, with names
appearing when hovering over a cell. These charts
may be useful for reviewing information such as the default collation order,
Script property assignments, and so forth during beta review.
The beta review period is a good opportunity to add support for the new
Unicode 17.0.0 characters in internal versions of software, so that software can
be tested to verify that the new characters and property assignments do not cause
problems when upgraded to Version 17.0.0 of Unicode.
Notable Issues for Beta Reviewers
Changes to Unicode Standard Annexes
Some of the Unicode Standard Annexes have modifications for
Unicode 17.0.0, often in coordination with changes to character properties.
See the Modifications section of each Annex for details of the relevant changes.
Core Specification Update
The beta review draft core specification is available as per-chapter web pages.
Reviewers should carefully check for inadvertent changes in the text, in particular in glyph examples. However, certain styling choices are not final, for example, whether tables have grid lines or not, or contain empty cells. Please do not comment on table styling, but do comment if you spot any significant errors in table content.
The text still contains a number of editor's notes, indicating both general information for
reviewers and spots in the text that are not yet complete for Unicode 17.0. Please use those
notes as guidance, as there is no need for repeated feedback reports regarding omissions or defects that the editors already know about and are actively working on.
Segmentation Issues
UTC 181 approved a significant change to the linebreaking algorithm that introduces a new Line_Break character property value, Unambiguous_Hyphen. The need for this originated in changes related to handling of hyphens in Hebrew that had been approved for Unicode 16.0 (see decision 179-C25) but that proved to be problematic when being implemented in ICU. A temporary fix was made for Unicode 16.0 (see 180-C18 and section 5.6 of L2/24-162). The change for Unicode 17.0 is a more complete fix to those issues. See 181-C53 and section 6.1 of L2/24-224 for complete details.
U+034F COMBINING GRAPHEME JOINER is not frequently used but is essential for certain situations, including in German and Biblical Hebrew text. Although it was first added to Unicode 3.2 in 2002, it has been tricky to get character properties and segmentation rules figured out. An analysis of the issues has been done. A detailed history of how handling of this character in Unicode’s specs has evolved over the years is now added to UAX #14. See Section 6.3 of L2/24-224 for details.
Script-specific Issues
- There are five new scripts encoded in Unicode 17.0.
- The Tai Yo script has complex rendering.
General Character Property Issues
- Note the change of field labels in TangutSources.txt and NushuSources.txt.
Numeric Property Issues
- There are two new sets of decimal digits added in Unicode 17.0, for newly encoded scripts: Tolong Siki and Chisoi. Implementations of numeric values and numeric formatting
should take these new sets into account.
Security and Identifier-related Issues
The Identifier_Type character property affects which characters are included in the General Security Profile for identifiers, which is a default recommendation for identifiers used in secure contexts. Depending on the Identifier_Type property value, characters are included (Identifier_Status = Allowed) or excluded (Identifier_Status = Restricted).
For Unicode 17.0, the assignments of Identifier_Type for all existing characters in recommended scripts were reviewed and updated to match the best currently available data on usage. Note changes to Identifier_Type for numerous characters, particularly those whose associated Identifier_Status changed from Allowed to Restricted. See Choosing Identifier_Type Values in UTS #39 for an associated explanation of the rationale behind these changes.
- Han Ideographs—instead of making all ideographs Recommended by default, they are now all Uncommon_Use, except for one fixed set of 19,842 Han ideographs in modern common use that are widely implemented across identifier systems. This changes the Identifier_Status of 77,838 Han characters from Allowed to Restricted
- Non ideographic characters—as result of review, the following changes occurred:
- 36 existing characters changed to Identifier_Status = Allowed as a result of Identifier_Type changes to Recommended or Inclusion.
- 1,099 characters changed to Identifier_Status = Restricted as a result of Identifier_Type changes to Obsolete, Technical or Uncommon_Use.
- Some characters changed Identifier_Type without affecting their Identifier_Status as Restricted.
- Bopomofo—the Bopomofo script is primarily limited to educational use. As a result the script has been reclassified as Limited_Use, making 74 Bopomofo characters Restricted.
- One newly-encoded character was assigned Identifier_Status = Allowed.
Unihan-related Issues
All Unihan
properties should be reviewed carefully. The following changes
deserve special attention:
- The new CJK Unified Ideographs Extension J block with 4,298 ideographs, pushes the number of CJK ideographs to over 100,000.
- horizontal extension for 2,145 G-source ideographs
- horizontal extension for 306 K-source ideographs
- changes to 1,685 G-source references
See UAX #38 for further details on these changes, especially Section 4.2, Listing
by Date of Addition to the Unicode Standard, and Section 4.3, Listing by
Location within Unihan.zip.
- kTotalStrokes syntax change
Other CJK-related changes:
- glyph changes for 11 G-source ideographs
- glyph changes for 366 T-source ideographs
Code Charts
As always, careful review of the updated code charts for Version 17.0.0 is advised.
Particular issues to take note of include:
- There are a number of other Han glyph updates.
- Other glyph updates are listed explicitly in the
delta charts index page.
- The two code charts for Egyptian hieroglyphs contain extensive functional and phonetic
information derived from the data file Unikemet.txt, and have notable further updates for Version 17.0.
Collation-related Issues
The Default Unicode Collation Element Table (DUCET) was updated to the Unicode 17.0.0
repertoire for UCA 17.0.0. For the most part, the additions for new
characters are unremarkable, but implementations should be checked to ensure
the new additions do not cause problems.
The former documentation file, CollationTest.html, has been merged into a
new section of UTS #10.
The DUCET ordering of Tangut components with respect to Tangut ideographs
has been modified. See Table 16, Computing Implicit Weights, in UTS #10 for details.
Other Issues
Please also check the following specific items carefully:
- 8 new emoji characters have been added. However, in addition to those individual characters, many new emoji sequences have been recognized, as well. If your implementation supports emoji, be sure to carefully review UTS #51, Unicode Emoji (PRI #518).
WARNING: There are changes to the end of two existing CJK unified ideograph ranges in Unicode 17.0.0. Because implementations often hard-code ideographic ranges to short-cut lookups and reduce table sizes, it is especially important that implementers pay close attention to the implications of range changes for Version 17.0.0. These extensions bump up the end ranges of the encoded ideographs as follows:
- 6 code points for Extension C: ending at U+2B73F
- 12 code points for Extension E: ending at U+2CEAD
See Section 4.4, Listing of Characters Covered by the Unihan Database in UAX #38 for the version history of all these small CJK unified ideograph additions inside existing blocks.
Reorganization of Some Data Files
The data files for UTSes that are synchronized with the Unicode Standard
have been reorganized. In particular, they can now all be found under
https://www.unicode.org/Public/draft/ for beta review. For the Unicode
release, each collection of data files for a UTS will be located under
the main versioned directory: https://www.unicode.org/Public/17.0.0/,
inside subdirectories at that level, instead of subdirectories directly
under /Public. No data files for prior releases will be moved, but
implementers should be aware that starting from Version 17.0.0, the
data files for collation, security, IDNA, and emoji will be posted
in their new locations.
General Issues
For current proposed updates to the particular UAXes, see
Proposed Updates for Standard Annexes
or use the links near the top of this page.
Particular issues in the UAXes may also be the focus of specific
Public Review Issues.
Each proposed textual change in a UAX is highlighted, so that you can focus
your review on those sections if you have limited time. The changes
are also listed in detail in the Modifications sections (linked from the table
of contents of each document), and are summarized in
UAX changes,
so you can check on those areas that might be of most
interest.
Some links between beta documents and the proposed
updates for UAXes will not work correctly during the
beta review period. This is a known problem which does
not need to be reported, as such links point to
the eventual final names or revision numbers for the
released versions.
Stability
Certain character properties for newly assigned characters cannot be
changed after the formal release of each version of the standard, because of the
Character Encoding Stability Policy.
Such character property values need special attention during the beta review process, as they
cannot be corrected after publication. These include:
- Any property affecting Unicode Normalization, including Decomposition_Mapping, Canonical_Combining_Class, and Composition_Exclusion.
- The determination of whether a character is included in identifiers (XID_Start, XID_Continue).
- Case foldings.
- There are also strong constraints on additions and changes to case mappings.