BETA Unicode 17.0.0

The Unicode Standard

Tech Site | Site Map | Search

BETA Unicode® 17.0.0

Note: The beta review period for Unicode 17.0.0 has closed, as of July 1, 2025. Feedback received during the public review can be referred to from PRI #526. This beta review page is left active, however, for convenience of access to the prepublication versions of the Unicode 17.0.0 data files and annexes, until the formal release planned for September 9, 2025.

The next version of the Unicode Standard will be Version 17.0.0, planned for release on September 9, 2025. This version updates several annexes and adds significant new repertoire. A total of 4847 new characters are encoded.

A beta version of the 17.0.0 Unicode Character Database files is available for public review. We strongly encourage implementers to review the summary description, download the beta 17.0.0 Unicode Character Database files, and test their programs with the new data, well before the end of the beta period. It is especially important to review the Notable Issues for Beta Reviewers.

We encourage users to check the code charts carefully to verify correctness of the new characters added to Unicode 17.0.0 and to ensure that there are no regressions in glyph shapes for previously encoded characters.

Summary description

Unicode character database (UCD)

Summary of beta charts

Single-block delta charts with yellow highlighting for new characters

Single-block charts for all of Unicode 17.0.0

Code charts - single download (134 MB)

Emoji charts for beta review

Auxiliary HTML charts for beta review

Unicode Standard Annexes (proposed updates)

If a link is not active for an annex, no proposed update is available for review. This situation may occur when no significant change is planned for that annex for a particular release.

UAX #9, Unicode Bidirectional Algorithm

UAX #11, East Asian Width

UAX #14, Unicode Line Breaking Algorithm

UAX #15, Unicode Normalization Forms

UAX #24, Unicode Script Property

UAX #29, Unicode Text Segmentation

UAX #31, Unicode Identifiers and Syntax

UAX #34, Unicode Named Character Sequences

UAX #38, Unicode Han Database (Unihan)

UAX #41, Common References for Unicode Standard Annexes

UAX #42, Unicode Character Database in XML

UAX #44, Unicode Character Database

UAX #45, U-Source Ideographs

UAX #50, Unicode Vertical Text Layout

UAX #53, Unicode Arabic Mark Rendering

UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet)

Related Unicode Technical Standards (proposed updates)

In addition to the Unicode Standard proper, four other Unicode Technical Standards have significant text and data file updates that are correlated with the new additions for Unicode 17.0.0. Review of that text and data is also encouraged during the beta review period.

Specification Data Files

UTS #10, Unicode Collation Algorithm DUCET and test files

UTS #39, Unicode Security Mechanisms Identifier and confusables files

UTS #46, Unicode IDNA Compatibility Processing IDNA mapping, test files, and derived data

UTS #51, Unicode Emoji Emoji data files (in UCD)

Emoji sequences and test files

Review and Feedback

For guidance on how to focus your review, see the section Notable Issues for Beta Reviewers.

Any feedback should be reported using the contact form. Comments on the Unicode Standard Version 17.0.0 or the Unicode Character Database data files should refer to the beta review Public Review Issue #526. Comments on specific Version 17.0.0 UAXes and UTSes should refer to the respective Public Review Issue Numbers for each document, where available.

The comment period ends July 1, 2025. All substantive technical comments must have been received by that date for consideration at the July UTC meeting. Editorial comments (typos, etc.) may be still submitted after that date for consideration in the final editorial work.

Note: All beta files may be updated, replaced, or superseded by other files at any time. The beta files will be discarded once Unicode 17.0.0 is final. It is inappropriate to cite these files as other than a work in progress. No products or implementations should be released based on the beta UCD data files—use only the final, approved Version 17.0.0 data files, expected on September 9, 2025.

The Unicode Consortium provides early access to updated versions of the data files and text to give reviewers and developers as much time as possible to ensure a problem-free adoption of Version 17.0.0.

The assignment of characters for Unicode 17.0.0 is now stable. There will be no further additions or modifications of code points and no further changes to character names. Please do not submit feedback requesting changes to code points or character names for Unicode 17.0.0, as such feedback is not actionable.

One of the main purposes of the beta review period is to verify and correct the preliminary character property assignments in the Unicode Character Database. Reviewers should check for property changes to existing Unicode 16.0.0 characters, as well as the property values for the new Unicode 17.0.0 character additions. The Auxiliary HTML charts include the new characters highlighted in yellow, with names appearing when hovering over a cell. These charts may be useful for reviewing information such as the default collation order, Script property assignments, and so forth during beta review.

The beta review period is a good opportunity to add support for the new Unicode 17.0.0 characters in internal versions of software, so that software can be tested to verify that the new characters and property assignments do not cause problems when upgraded to Version 17.0.0 of Unicode.

Notable Issues for Beta Reviewers

Changes to Unicode Standard Annexes

Some of the Unicode Standard Annexes have modifications for Unicode 17.0.0, often in coordination with changes to character properties.

See the Modifications section of each Annex for details of the relevant changes.

Core Specification Update

The beta review draft core specification is available as per-chapter web pages.

Reviewers should carefully check for inadvertent changes in the text, in particular in glyph examples. However, certain styling choices are not final, for example, whether tables have grid lines or not, or contain empty cells. Please do not comment on table styling, but do comment if you spot any significant errors in table content.

The text still contains a number of editor's notes, indicating both general information for reviewers and spots in the text that are not yet complete for Unicode 17.0. Please use those notes as guidance, as there is no need for repeated feedback reports regarding omissions or defects that the editors already know about and are actively working on.

Segmentation Issues

UTC 181 approved a significant change to the linebreaking algorithm that introduces a new Line_Break character property value, Unambiguous_Hyphen. The need for this originated in changes related to handling of hyphens in Hebrew that had been approved for Unicode 16.0 (see decision 179-C25) but that proved to be problematic when being implemented in ICU. A temporary fix was made for Unicode 16.0 (see 180-C18 and section 5.6 of L2/24-162). The change for Unicode 17.0 is a more complete fix to those issues. See 181-C53 and section 6.1 of L2/24-224 for complete details.

U+034F COMBINING GRAPHEME JOINER is not frequently used but is essential for certain situations, including in German and Biblical Hebrew text. Although it was first added to Unicode 3.2 in 2002, it has been tricky to get character properties and segmentation rules figured out. An analysis of the issues has been done. A detailed history of how handling of this character in Unicode’s specs has evolved over the years is now added to UAX #14. See Section 6.3 of L2/24-224 for details.

Script-specific Issues

There are five new scripts encoded in Unicode 17.0.

The Tai Yo script has complex rendering.

General Character Property Issues

Note the change of field labels in TangutSources.txt and NushuSources.txt.

Numeric Property Issues

There are two new sets of decimal digits added in Unicode 17.0, for newly encoded scripts: Tolong Siki and Chisoi. Implementations of numeric values and numeric formatting should take these new sets into account.

Security and Identifier-related Issues

The Identifier_Type character property affects which characters are included in the General Security Profile for identifiers, which is a default recommendation for identifiers used in secure contexts. Depending on the Identifier_Type property value, characters are included (Identifier_Status = Allowed) or excluded (Identifier_Status = Restricted).

For Unicode 17.0, the assignments of Identifier_Type for all existing characters in recommended scripts were reviewed and updated to match the best currently available data on usage. Note changes to Identifier_Type for numerous characters, particularly those whose associated Identifier_Status changed from Allowed to Restricted. See Choosing Identifier_Type Values in UTS #39 for an associated explanation of the rationale behind these changes.

Han Ideographs—instead of making all ideographs Recommended by default, they are now all Uncommon_Use, except for one fixed set of 19,842 Han ideographs in modern common use that are widely implemented across identifier systems. This changes the Identifier_Status of 77,838 Han characters from Allowed to Restricted

Non ideographic characters—as result of review, the following changes occurred:

36 existing characters changed to Identifier_Status = Allowed as a result of Identifier_Type changes to Recommended or Inclusion.

1,099 characters changed to Identifier_Status = Restricted as a result of Identifier_Type changes to Obsolete, Technical or Uncommon_Use.

Some characters changed Identifier_Type without affecting their Identifier_Status as Restricted.

Bopomofo—the Bopomofo script is primarily limited to educational use. As a result the script has been reclassified as Limited_Use, making 74 Bopomofo characters Restricted.

One newly-encoded character was assigned Identifier_Status = Allowed.

Unihan-related Issues

All Unihan properties should be reviewed carefully. The following changes deserve special attention:

The new CJK Unified Ideographs Extension J block with 4,298 ideographs, pushes the number of CJK ideographs to over 100,000.

horizontal extension for 2,145 G-source ideographs

horizontal extension for 306 K-source ideographs

changes to 1,685 G-source references

See UAX #38 for further details on these changes, especially Section 4.2, Listing by Date of Addition to the Unicode Standard, and Section 4.3, Listing by Location within Unihan.zip.

kTotalStrokes syntax change

Other CJK-related changes:

glyph changes for 11 G-source ideographs

glyph changes for 366 T-source ideographs

Code Charts

As always, careful review of the updated code charts for Version 17.0.0 is advised. Particular issues to take note of include:

There are a number of other Han glyph updates.

Other glyph updates are listed explicitly in the delta charts index page.

The two code charts for Egyptian hieroglyphs contain extensive functional and phonetic information derived from the data file Unikemet.txt, and have notable further updates for Version 17.0.

Collation-related Issues

The Default Unicode Collation Element Table (DUCET) was updated to the Unicode 17.0.0 repertoire for UCA 17.0.0. For the most part, the additions for new characters are unremarkable, but implementations should be checked to ensure the new additions do not cause problems.

The former documentation file, CollationTest.html, has been merged into a new section of UTS #10.

The DUCET ordering of Tangut components with respect to Tangut ideographs has been modified. See Table 16, Computing Implicit Weights, in UTS #10 for details.

Other Issues

Please also check the following specific items carefully:

8 new emoji characters have been added. However, in addition to those individual characters, many new emoji sequences have been recognized, as well. If your implementation supports emoji, be sure to carefully review UTS #51, Unicode Emoji (PRI #518).

WARNING: There are changes to the end of two existing CJK unified ideograph ranges in Unicode 17.0.0. Because implementations often hard-code ideographic ranges to short-cut lookups and reduce table sizes, it is especially important that implementers pay close attention to the implications of range changes for Version 17.0.0. These extensions bump up the end ranges of the encoded ideographs as follows:

6 code points for Extension C: ending at U+2B73F

12 code points for Extension E: ending at U+2CEAD

See Section 4.4, Listing of Characters Covered by the Unihan Database in UAX #38 for the version history of all these small CJK unified ideograph additions inside existing blocks.

Reorganization of Some Data Files

The data files for UTSes that are synchronized with the Unicode Standard have been reorganized. In particular, they can now all be found under https://www.unicode.org/Public/draft/ for beta review. For the Unicode release, each collection of data files for a UTS will be located under the main versioned directory: https://www.unicode.org/Public/17.0.0/, inside subdirectories at that level, instead of subdirectories directly under /Public. No data files for prior releases will be moved, but implementers should be aware that starting from Version 17.0.0, the data files for collation, security, IDNA, and emoji will be posted in their new locations.

General Issues

For current proposed updates to the particular UAXes, see Proposed Updates for Standard Annexes or use the links near the top of this page. Particular issues in the UAXes may also be the focus of specific Public Review Issues. Each proposed textual change in a UAX is highlighted, so that you can focus your review on those sections if you have limited time. The changes are also listed in detail in the Modifications sections (linked from the table of contents of each document), and are summarized in UAX changes, so you can check on those areas that might be of most interest.

Some links between beta documents and the proposed updates for UAXes will not work correctly during the beta review period. This is a known problem which does not need to be reported, as such links point to the eventual final names or revision numbers for the released versions.

Stability

Certain character properties for newly assigned characters cannot be changed after the formal release of each version of the standard, because of the Character Encoding Stability Policy. Such character property values need special attention during the beta review process, as they cannot be corrected after publication. These include:

Any property affecting Unicode Normalization, including Decomposition_Mapping, Canonical_Combining_Class, and Composition_Exclusion.

The determination of whether a character is included in identifiers (XID_Start, XID_Continue).

Case foldings.

There are also strong constraints on additions and changes to case mappings.