[Unicode]  The Unicode Standard Home | Site Map | Search
 

BETA Unicode 6.2.0

The next version of the Unicode Standard will be Version 6.2.0, planned for release in September 2012. A beta version of the 6.2.0 Unicode Character Database files is available for public review. We strongly encourage implementers to review the summary description, download the beta 6.2.0 Unicode Character Database files, and test their programs with the new data, well before the end of the beta period. It is especially important to review the Notable Issues for Beta Reviewers.

We encourage users to check the code charts carefully to verify correctness of the new characters added to Unicode 6.2.0 and to ensure that there are no regressions in glyph shapes for previously encoded characters.

Summary description Unicode 6.2.0
Unicode character database (UCD) httpftp
Summary of beta charts Readme.txt
Single-block charts with yellow highlighting for new characters delta charts
Single block charts for all of Unicode 6.2.0 httpftp
Code charts - single download (93MB) httpftp

Related Unicode Technical Standards

In addition to the Unicode Standard proper, two other Unicode Technical Standards have significant text and data file updates that are correlated with the new additions for Unicode 6.2.0. Review of that text and data is also encouraged during the beta review period. 

Review and Feedback

For guidance on how to focus your review, see the section Notable Issues for Beta Reviewers.

Any feedback should be reported using the contact form. Comments on the Unicode Standard Version 6.2.0 or the Unicode Character Database data files, should refer to the beta review Public Review Issue #230. Comments on specific Version 6.2.0 UAXes and UTSes should refer to the respective Public Review Issue Numbers for each document.

The comment period ends July 23, 2012. All substantive technical comments must have been received by that date for consideration at the August UTC meeting. Editorial comments (typos, etc.) may be still submitted after that date for consideration in the final editorial work.

Note: All beta files may be updated, replaced, or superseded by other files at any time. The beta files will be discarded once Unicode 6.2.0 is final. It is inappropriate to cite these files as other than a work in progress. No products or implementations should be released based on the beta UCD data files -- use only the final, approved Version 6.2.0 data files, expected in September 2012.

The Unicode Consortium provides early access to updated versions of the data files and text to give reviewers and developers as much time as possible to ensure a problem-free adoption of Version 6.2.0.

The assignment of characters for Unicode 6.2.0 is now stable. There will be no further additions or modifications of code points and no further changes to character names. Please do not submit feedback requesting changes to code points or character names for Unicode 6.2.0, as such feedback is not actionable.

One of the main purposes of the beta review period is to verify and correct the preliminary character property assignments in the Unicode Character Database. Reviewers should check for property changes to existing Unicode 6.1.0 characters, as well as the property values for the new Unicode 6.2.0 character additions.

To facilitate verification of the property changes and additions, diffable XML versions of the Unicode Character Database are available. These XML files are dated, so that people can check the details of changes that occurred during the beta review period. The XML files are in the http://www.unicode.org/Public/6.2.0/diffs/ directory. For more information, see the diffs.readme.txt file.

The beta review period is a good opportunity to add support for the new Unicode 6.2.0 characters in internal versions of software, so that software can be tested to verify that the new characters and property assignments do not cause problems when upgraded to Version 6.2.0 of Unicode.

Notable Issues for Beta Reviewers

Some of the Unicode Standard Annexes have substantial modifications for Unicode 6.2.0, often in coordination with changes to character properties. For current proposed updates to the particular UAXes, see Proposed Updates for Standard Annexes or use the links in the navigation bar on this page. Particular issues in the UAXes may also be the focus of specific Public Review Issues. Each proposed textual change in a UAX is highlighted, so that you can focus your review on those sections if you have limited time. The changes are also listed in detail in the Modifications sections (linked from the table of contents of each document), and are summarized in UAX changes, so you can check on those areas that might be of most interest. Some links between beta documents and the proposed updates for UAXes will not work correctly during the beta review period. This is a known problem which does not need to be reported, as such links are links to the eventual final names or revision numbers for the released versions.

Certain character properties for newly assigned characters cannot be changed after the formal release of each version of the standard, because of the Character Encoding Stability Policy. Such character property values need special attention during the beta review process, as they cannot be corrected after publication. These include:

  • Any property affecting Unicode Normalization, including Decomposition_Mapping, Canonical_Combining_Class, and Composition_Exclusion.
  • The determination of whether a character is included in identifiers (XID_Start, XID_Continue).
  • Case mappings and case foldings.

Please also check the following specific items carefully:

  • There are additional property changes listed in UAX #44, Unicode Character Database that may affect some implementations.
  • The encoding of the data file for the Unicode names list, NamesList.txt, has been changed from Latin-1 to UTF-8, to be consistent with the encoding for all other data files in the UCD. Implementations which parse NamesList.txt need to be aware of this change in encoding.
  • PRI #227 proposes changes to the Script_Extensions property that will affect Unicode 6.2, so be sure to check that PRI.
  • PRI #228 proposes changes to the General_Category of some common punctuation characters to change them to symbols. That change would affect Unicode 6.2, so be sure to check that PRI.
  • PRI #229 proposes changes to the linebreaking behavior of a large class of pictographic symbols. Those changes would affect the Line_Break property and the text of UAX #14 for Unicode 6.2, so be sure to check that PRI.
  • There are other changes proposed for text segmentation properties for certain characters. See the proposed update text for UAX #14 and UAX #29. If these changes are made, they would affect property values (and test cases) in some of the auxiliary data files in the UCD, as well as the text of those two annexes.
  • Informative property values in the Unihan database for two CJK unified ideographs have been changed:
    • U+365B: the kMandarin value changed from zhuān to
    • U+214CC: the kRSUnicode value changed from 32.15 to 32.16, and the kTotalStrokes value correspondingly changed from 18 to 19