[Unicode]  The Standard Home | Site Map | Search
 

Updates and Errata

The following is a list of errata noted for The Unicode Standard, Version 5.1, its code charts, annexes and the Unicode Character Database. It is periodically updated to include corrections to typographic errors and new clarifications of the text. This list also includes errata noted for the text of the book, The Unicode Standard, Version 5.0, and not yet corrected in consolidated text.

Formal corrigenda notices for the Unicode Standard can be found at Corrigenda to the Unicode Standard. Corrigenda for the Unicode CLDR are posted at Unicode CLDR Corrigenda, and errata notices for UTS #35: Locale Data Markup Language (LDML) can be found at Errata for UTS #35 LDML.

Updates to Prior to Incorporated in
Unicode 5.0 2008-March-15 Unicode 5.1
Unicode 4.1 2006-July-14 Unicode 5.0
Unicode 4.0 2005-March-31 Unicode 4.1
Unicode 3.2 2003-April-17 Unicode 4.0
Unicode 3.1 2002-March-25 Unicode 3.2
Unicode 3.0 2001-March-23 Unicode 3.1

Reports of errors in published documents, such as the Unicode Standard itself or Unicode Technical Reports, may be filed using the Unicode Consortium's online form. If confirmed, and depending on the nature of the reported error, an erratum may be posted on this page, to be fixed in subsequent editions of the Standard.

Date  Summary 
2008-May-7 In UAX #31, "Unicode Identifier and Pattern Syntax" (Version 5.1.0), there is a typo in the description for (X)ID_Start in Table 2, Lexical Classes for Identifiers. "letter numbers (Lu)" should be corrected to read "letter numbers (Nl)".
2008-April-29 In UAX #29, "Unicode Text Segmentation" (Version 5.1.0), there is a typo in the definition of Prepend in Table 2, Grapheme_Cluster_Break Property Values. The correct definition is: "Logical_Order_Exception=True".
2008-April-28 In the Version 5.1 Unicode Character Database, the test cases in the test data file LineBreakTest.txt incorrectly indicate the presence of a break at the beginning of each line (with "÷"). These should be corrected to indicate no break at the beginning of each line (with "×"), to reflect the effect of LB2 "Never break at the start of text" from UAX #14, "Unicode Line Breaking Algorithm". Correspondingly, the documentation in LineBreakTest.html should have the rule 0.2 corrected to read: "sot ×".
2008-February-12 On p. 124 of The Unicode Standard, Version 5.0, there is an error in the Regular Expressions column for "More_Above", in the third row of Table 3-14, Context Specification for Casing. The corrected regular expression should be:

[^\p{ccc=230}\p{ccc=0}]* [\p{ccc=230}]

2007-June-26 The following text from the last paragraph of Section 15.4, Mathematical Symbols, on page 507 of The Unicode Standard, Version 5.0:

Using U+2278 or U+2279 with VS1 will request these variants explicitly, as will using U+2276 less-than or greater-than or U+2277 greater-than or less-than with U+20D2 combining long vertical line overlay. Unless fonts are created with the intention to add support for both forms (via VS1 for the upright forms),...

Should be replaced by this text:

Using U+2276 or U+2277 followed by U+20D2 COMBINING LONG VERTICAL LINE OVERLAY represents these upright variants explicitly. Except for those fonts created with the intention to add support for both forms (via combination of U+2276 or U+2277 with U+20D2 for the upright forms),...

2007-June-4

In Section 12.1, Han on p. 424 of The Unicode Standard, Version 5.0, the last paragraph states that U+FA70 to U+FAD9 are "included in the Unicode Standard to provide full round-trip compatibility with the ideographic repertoire of PKS 5700 parts 1, 2, and 3." However, the Korean standard listed is incorrect, and the text should be corrected to "... the ideographic repertoire of KPS 10721-2000."

2007-May-24 On p. 479 of The Unicode Standard, Version 5.0, the subheading for Linear B Ideographs lists the range as "U+10080--U+108FF". That should be corrected to "U+10080--U+100FF".
2007-January-11 There is an error in the entry for "Trailing Consonant" on page 1147 in the glossary of The Unicode Standard, Version 5.0. "Vowel_Jamo" should be "Trailing_Jamo" in definition (1), thus reading "(1) In Korean, a jamo character with the
Hangul_Syllable_Type property value Trailing_Jamo (in the range U+11A8..U+11F9)."
2007-January-5 There is an error in the sample code in section 5.17 on page 182 of The Unicode Standard, Version 5.0. The entry 0x2F in the second row of the rotate table should instead be 0x1F.
2007-January-4 On page 411 of The Unicode Standard, Version 5.0, Table 12-2 incorrectly states the extent of the CJK Unified Ideographs Extension A block. The correct range is U+3400 to U+4DBF. In particular, the Yijing Hexagram Symbols starting at U+4DC0 are not part of Extension A.
2007-January-2 Due to a printing error, the Unified Canadian Aboriginal Syllabics glyphs at U+1424, U+1426, and U+1487 are missing in the code charts and names list on pages 684 and 687-88 of The Unicode Standard, Version 5.0. These glyphs were correctly represented in the online charts and can be viewed at http://www.unicode.org/charts/PDF/U1400.pdf.
2007-January-2 The file UNIHAN/FullRSIndex.pdf on the Unicode 5.0 CD-ROM is missing a final page with the last half of the entry for 211 (tooth) and the complete entries for 212 (dragon), 213 (turtle), and 214 (flute). The missing page is available here as a PDF.
2006-December-21 Table 11-16 in The Unicode Standard, Version 5.0 shows "kyu" twice: once at the top of part on page 402 and once at the top of the part on page 403. The repetition is an error and the second instance should be removed.