[Unicode]  The Standard Home | Site Map | Search
 

Updates and Errata

The following is a list of errata noted for The Unicode Standard, Version 5.2, its code charts, annexes and the Unicode Character Database. It is periodically updated to include corrections to typographic errors and new clarifications of the text. This list also includes errata noted for the core specification, The Unicode Standard, Version 5.2, and not yet corrected in consolidated text.

Formal corrigenda notices for the Unicode Standard can be found at Corrigenda to the Unicode Standard. Corrigenda for the Unicode CLDR are posted at Unicode CLDR Corrigenda, and errata notices for UTS #35: Locale Data Markup Language (LDML) can be found at Errata for UTS #35 LDML.

Updates to Prior to Incorporated in
Unicode 5.1 2009-October-1 Unicode 5.2
Unicode 5.0 2008-March-15 Unicode 5.1
Unicode 4.1 2006-July-14 Unicode 5.0
Unicode 4.0 2005-March-31 Unicode 4.1
Unicode 3.2 2003-April-17 Unicode 4.0
Unicode 3.1 2002-March-25 Unicode 3.2
Unicode 3.0 2001-March-23 Unicode 3.1

Reports of errors in published documents, such as the Unicode Standard itself or Unicode Technical Reports, may be filed using the Unicode Consortium's online form. If confirmed, and depending on the nature of the reported error, an erratum may be posted on this page, to be fixed in subsequent editions of the Standard.

Date  Summary 
2009-Oct-14 In Version 5.2 (and Version 5.1) of UAX #31, "Unicode Identifier and Pattern Syntax," there is an error in Figure 2. Farsi Example with ZWNJ, where the code point numbers for Alef (U+0627) and Meem (U+0645) are swapped. The correct values are:
first row: 0646+0627+0645+0647+0627+06CC
second row: 0646+0627+0645+0647+200C+0627+06CC
2009-Oct-1 In the Version 5.2 Unicode Character Database, there is one IRG source mapping missing in the data file Unihan_IRGSources.txt (contained in Unihan.zip). The following entry should be added:

U+2ADFF kIRG_HSource 87DC

2008-April-28 In the Version 5.1 Unicode Character Database, the test cases in the test data file LineBreakTest.txt incorrectly indicate the presence of a break at the beginning of each line (with "÷"). These should be corrected to indicate no break at the beginning of each line (with "×"), to reflect the effect of LB2 "Never break at the start of text" from UAX #14, "Unicode Line Breaking Algorithm". Correspondingly, the documentation in LineBreakTest.html should have the rule 0.2 corrected to read: "sot ×".