BETA Unicode 18.0.0

The Unicode Standard

Tech Site | Site Map | Search

BETA Unicode® 18.0.0

Note: The beta review period for Unicode 18.0.0 has closed, as of July 7, 2026. Feedback received during the public review can be referred to from PRI #548. This beta review page is left active, however, for convenience of access to the prepublication versions of the Unicode 18.0.0 data files and annexes, until the formal release planned for September 16, 2026.

The next version of the Unicode Standard will be Version 18.0.0, planned for release on September 16, 2026. This version updates several annexes, adds one new annex, and adds significant new repertoire. A total of 13,047 new characters are encoded.

A beta version of the 18.0.0 Unicode Character Database files is available for public review. We strongly encourage implementers to review the summary description, download the beta 18.0.0 Unicode Character Database files, and test their programs with the new data, well before the end of the beta period. It is especially important to review the Notable Issues for Beta Reviewers.

We encourage users to check the code charts carefully to verify correctness of the new characters added to Unicode 18.0.0 and to ensure that there are no regressions in glyph shapes for previously encoded characters.

Summary description

Unicode character database (UCD)

Single-block delta charts with yellow highlighting for new characters

Single-block charts for all of Unicode 18.0.0

Consolidated code charts - single download (167 MB)

Emoji charts for beta review

Auxiliary HTML charts for beta review

Unicode Standard Annexes (proposed updates or drafts)

If a link is not active for an annex, no proposed update or draft is available for review. This situation may occur when no significant change is planned for that annex for a particular release.

UAX #9, Unicode Bidirectional Algorithm

UAX #11, East Asian Width

UAX #14, Unicode Line Breaking Algorithm

UAX #15, Unicode Normalization Forms

UAX #24, Unicode Script Property

UAX #29, Unicode Text Segmentation

UAX #31, Unicode Identifiers and Syntax

UAX #34, Unicode Named Character Sequences

UAX #38, Unicode Han Database (Unihan)

UAX #41, Common References for Unicode Standard Annexes

UAX #42, Unicode Character Database in XML

UAX #44, Unicode Character Database

UAX #45, U-Source Ideographs

UAX #50, Unicode Vertical Text Layout

UAX #53, Unicode Arabic Mark Rendering

UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet)

UAX #60, Data for Large East Asian Scripts

Related Unicode Technical Standards (proposed updates)

In addition to the Unicode Standard proper, four other Unicode Technical Standards have significant text and data file updates that are correlated with the new additions for Unicode 18.0.0. Review of that text and data is also encouraged during the beta review period.

Specification Data Files

UTS #10, Unicode Collation Algorithm DUCET and test files

UTS #39, Unicode Security Mechanisms Identifier and confusables files

UTS #46, Unicode IDNA Compatibility Processing IDNA mapping, test files, and derived data

UTS #51, Unicode Emoji Emoji data files (in UCD)

Emoji sequences and test files

UTS #58, Unicode Link Detection and Formatting: URLs and Email Addresses Linkification data

Review and Feedback

For guidance on how to focus your review, see the section Notable Issues for Beta Reviewers.

Any feedback should be reported using the contact form. Comments on the Unicode Standard Version 18.0.0 or the Unicode Character Database data files should refer to the beta review Public Review Issue #548. Comments on specific Version 18.0.0 UAXes and UTSes should refer to the respective Public Review Issue Numbers for each document, where available.

The comment period ends July 7, 2026. All substantive technical comments must have been received by that date for consideration at the July UTC meeting. Editorial comments (typos, etc.) may be still submitted after that date for consideration in the final editorial work.

Note: All beta files may be updated, replaced, or superseded by other files at any time. The beta files will be discarded once Unicode 18.0.0 is final. It is inappropriate to cite these files as other than a work in progress. No products or implementations should be released based on the beta UCD data files—use only the final, approved Version 18.0.0 data files, expected on September 15, 2026.

The Unicode Consortium provides early access to updated versions of the data files and text to give reviewers and developers as much time as possible to ensure a problem-free adoption of Version 18.0.0.

The assignment of characters for Unicode 18.0.0 is now stable. There will be no further additions or modifications of code points and no further changes to character names. Please do not submit feedback requesting changes to code points or character names for Unicode 18.0.0, as such feedback is not actionable.

One of the main purposes of the beta review period is to verify and correct the preliminary character property assignments in the Unicode Character Database. Reviewers should check for property changes to existing Unicode 17.0.0 characters, as well as the property values for the new Unicode 18.0.0 character additions.

The beta review period is a good opportunity to add support for the new Unicode 18.0.0 characters in internal versions of software, so that software can be tested to verify that the new characters and property assignments do not cause problems when upgraded to Version 18.0.0 of Unicode.

Notable Issues for Beta Reviewers

Changes to Unicode Standard Annexes

Some of the Unicode Standard Annexes have modifications for Unicode 18.0.0, often in coordination with changes to character properties.

See the Modifications section of each Annex for details of the relevant changes.

Core Specification Update

The beta review draft core specification is available as per-chapter web pages.

Wording in the core specification of earlier versions was not completely clear regarding variation sequences and conformance. To provide greater clarity, the text describing variation sequences and related conformance requirements has been revised. See Section 3.6.2 in the draft core spec for details. There are some related changes to Section 5.20 and Section 23.4, as well. Please review the draft text carefully.

Reviewers should carefully check for inadvertent changes in the text, in particular in glyph examples. However, certain styling choices are not final, for example, whether tables have grid lines or not, or contain empty cells. Please do not comment on table styling, but do comment if you spot any significant errors in table content.

The text still contains a number of editor's notes, indicating both general information for reviewers and spots in the text that are not yet complete for Unicode 18.0. Please use those notes as guidance, as there is no need for repeated feedback reports regarding omissions or defects that the editors already know about and are actively working on.

Segmentation Issues

There has been a change to line breaking rule LB12a and to the Line_Break property assignments of some dashes and hyphens, including SOFT HYPHEN. This fixes a regression in the behaviour of an EN DASH set aside from preceding text by a NO-BREAK SPACE that had been introduced in Unicode Version 5.1.

Grapheme cluster breaking rule GB9c, which binds Indic conjuncts, has been changed to eliminate the requirement for context before the linker. This improves grapheme cluster breaking for Balinese.

The derivation of the Indic_Conjunct_Break property has been changed by UTC #187, correcting a regression in the behaviour of Zanabazar square grapheme cluster breaking that had been introduced in Unicode Version 17.0.

An additional change to the derivation of the Indic_Conjunct_Break has been made for beta: the derivation uses Script_Extensions instead of Script. This improves grapheme cluster breaking for Bengali.
This change has not yet been approved by the UTC, but is included in the beta for public review.

Script-specific Issues

There are four new scripts encoded in Unicode 18.0: Chisoi, Jurchen, Seal, and Proto-Cuneiform.

Chisoi is a small alphabetic script.

Jurchen and Seal are large siniform ideographic scripts, with Seal (also known as Small Seal) consisting of 11,328 new characters.

Proto-Cuneiform is the historic precursor of Sumero-Akkadian Cuneiform. In Unicode 18.0, Proto-Cuneiform is only represented by the addition of archaic numerals; the encoding does not yet include non-numeric signs.

General Character Property Issues

Two new UCD data files, JurchenSources.txt and SealSources.txt, are associated with two new scripts, Jurchen and Seal. The latter data file includes kSEAL_THXSrc, kSEAL_CCZSrc, kSEAL_DYCSrc, and kSEAL_QJZSrc as new normative properties.

Numeric Property Issues

The newly added Chisoi script has a set of decimal digits. Numeric implementations should take this into account.

The new Archaic Cuneiform Numerals block contains a very large set of numeric characters. Specialist implementations dealing with cuneiform text should be aware of these characters, which also pose challenges for formatting and for font design.

Security and Identifier-related Issues

Many lines of unused data in confusables.txt have been removed. These lines corresponded to characters that never appear in NFD form and thus were never exercised by the Confusable Detection algorithm in UTS #39.

Several corrections and additions to confusables data has been made, incorporating parts of a large backlog of public contributions to confusables data. Emphasis has been on confusable pairs with at least one side having Identifier_Status=Allowed.

The entries in the file IdentifierType.txt are no longer grouped by sets of values, but are instead ordered by code point; this is similar to the change made to ScriptExtensions.txt in Unicode Version 16.0.

Unihan-related Issues

All Unihan properties should be reviewed carefully. The following changes deserve special attention:

Two old provisional properties, kIRGDaeJaweon and kIRGKangXi, have been removed.

Two new provisional properties, kJapaneseNewVariant and kJapaneseOldVariant, have been added.

The delimiters of 23 properties, including kIICore and kOtherNumeric, have been changed.

The syntax of the provisional kGB5 and normative kIRG_GSource properties have been changed.

The descriptions of five properties have been changed.

U+6B25 was disunified, and its disunified form has been appended to the CJK Unified Ideographs Extension D block at the new code point U+2B81E.

U+980B, U+FACB, and U+2F9FF were disunified, and their disunified form has been assigned to the existing code point U+2EA07 in the CJK Unified Ideographs Extension F block.

Nearly 1,800 IRG source references have been changed.

The kRSUnicode property values of 22 ideographs have been changed.

The kTotalStrokes property values of five ideographs have been changed.

horizontal extension for 597 G-source ideographs

horizontal extension for one H-source ideograph

horizontal extension for one K-source ideograph

horizontal extension for 163 T-source ideographs

horizontal extension for seven U-source ideographs

See UAX #38 for further details on these changes, especially Section 4.2, Listing by Date of Addition to the Unicode Standard, and Section 4.3, Listing by Location within Unihan.zip.

Other CJK-related changes:

glyph changes for 115 G-source ideographs

glyph changes for two K-source ideographs

glyph changes for 23 T-source ideographs

glyph change for one V-source ideograph

Standardized Variation Sequences

17 variation sequences have been added for CJK strokes.

10 variation sequences have been added for various math operators and a script small z.

Code Charts

As always, careful review of the updated code charts for Version 18.0.0 is advised. Particular issues to take note of include:

The chart fonts of Armenian and Khitan Small Script have been changed to Noto Serif Armenian and Khitan Small Linear.

The glyph of U+06C4 ARABIC LETTER WAW WITH RING has been corrected per 187-C13.

There are a number of other Han glyph updates.

Other glyph updates are listed explicitly in the delta charts index page.

The version-specific code charts are now distributed under the same /Public/18.0.0/charts/ directory as the consolidated charts, and an explicitly versioned code charts index page is now available early to help with access to them.

The auxiliary charts have been completely reconstituted, and are now also distributed under the /Public/18.0.0/charts/ directory. See Collation and Casing Charts. The auxiliary Names List charts are planned, but are not yet available for beta review.

To explain these changes, the Code Charts Help page has been substantially reorganized and is also now explicitly versioned along with the charts.

Collation-related Issues

The Default Unicode Collation Element Table (DUCET) was updated to the Unicode 18.0.0 repertoire for UCA 18.0.0. For the most part, the additions for new characters are unremarkable, but implementations should be checked to ensure the new additions do not cause problems.

The two new large siniform ideographic scripts, Jurchen and Seal, are given collation weights using implicit weighting. This requires a small change to the implicit weighting algorithm, to add new base weights. Implementations of UCA should be aware of this change.

Several special mappings have been added to the DUCET. These had already been added in the CLDR root collation tailoring.

10 additional contraction mappings for Tibetan, for making the DUCET well-formed.

Mapping U+FFFF to a collation element with the highest primary weight.

Mapping U+FFFE to a collation element with the lowest non-zero primary weight, and some special processing, for “Merging Sort Keys” within code point space.

The Shift-Trimmed option was not previously recommended. It has now been removed completely from the UCA specification.

Linkification-related Issues

The recently added UTS #58, Unicode Link Detection and Formatting: URLs and Email Addresses is published in synchronization with the Unicode Standard starting with Unicode 18.0.0.

Other Issues

Please also check the following specific items carefully:

The UCD file Index.txt will not be updated for Unicode Version 18.0; the file published in Version 18.0 and included in the beta is identical to the file for Version 17.0. This file will be removed altogether in Unicode Version 19.0.

The permuted index generated from the manually-curated Index.txt has been replaced with a search tool based on the names list and other UCD data. This search tool is available where the static index used to be, at https://www.unicode.org/charts/charindex.html. During beta review, a version of the index based on the beta repertoire is also available.

Reorganization of Some Data Files

The /Public/18.0.0/ directory now contains a linkification subdirectory, which contains the data files associated with UTS #58, Unicode Link Detection and Formatting: URLs and Email Addresses.

The /Public/18.0.0/charts/ directory has been substantially expanded to include all of the versioned block charts and the auxiliary charts, as well as the navigation pages for code charts.

General Issues

For current proposed updates to the particular UAXes, see Proposed Updates for Standard Annexes or use the links near the top of this page. Particular issues in the UAXes may also be the focus of specific Public Review Issues. Each proposed textual change in a UAX is highlighted, so that you can focus your review on those sections if you have limited time. The changes are also listed in detail in the Modifications sections (linked from the table of contents of each document), and are summarized in UAX changes, so you can check on those areas that might be of most interest.

Some links between beta documents and the proposed updates for UAXes will not work correctly during the beta review period. This is a known problem which does not need to be reported, as such links point to the eventual final names or revision numbers for the released versions.

Stability

Certain character properties for newly assigned characters cannot be changed after the formal release of each version of the standard, because of the Character Encoding Stability Policy. Such character property values need special attention during the beta review process, as they cannot be corrected after publication. These include:

Any property affecting Unicode Normalization, including Decomposition_Mapping, Canonical_Combining_Class, and Composition_Exclusion.

The determination of whether a character is included in identifiers (XID_Start, XID_Continue).

Case foldings.

There are also strong constraints on additions and changes to case mappings.