[Unicode]  The Standard Home | Site Map | Search
 

Updates and Errata

The following is a list of errata noted for The Unicode Standard, Version 5.1, its code charts, annexes and the Unicode Character Database. It is periodically updated to include corrections to typographic errors and new clarifications of the text. This list also includes errata noted for the text of the book, The Unicode Standard, Version 5.0, and not yet corrected in consolidated text.

Formal corrigenda notices for the Unicode Standard can be found at Corrigenda to the Unicode Standard. Corrigenda for the Unicode CLDR are posted at Unicode CLDR Corrigenda, and errata notices for UTS #35: Locale Data Markup Language (LDML) can be found at Errata for UTS #35 LDML.

Updates to Prior to Incorporated in
Unicode 5.0 2008-March-15 Unicode 5.1
Unicode 4.1 2006-July-14 Unicode 5.0
Unicode 4.0 2005-March-31 Unicode 4.1
Unicode 3.2 2003-April-17 Unicode 4.0
Unicode 3.1 2002-March-25 Unicode 3.2
Unicode 3.0 2001-March-23 Unicode 3.1

Reports of errors in published documents, such as the Unicode Standard itself or Unicode Technical Reports, may be filed using the Unicode Consortium's online form. If confirmed, and depending on the nature of the reported error, an erratum may be posted on this page, to be fixed in subsequent editions of the Standard.

Date  Summary 
2008-Aug-21

In UAX #31, "Unicode Identifier and Pattern Syntax" (Version 5.1.0), there is a mistake in the first bullet of A2 in Section 2.3, Layout and Format Control Characters. The text ", followed by a Letter" should be deleted from that bullet, to make the textual description consistent with the regular expression description in the following bullet.

2008-Jun-06

In the code charts for Unicode Versions 5.1 and earlier, the representative glyphs for the case pairs of two Cyrillic letters for Abkhaz, Abkhazian Ha (04A8/04A9) and Abkhazian Che (04BE/04BF), are shown in an old style that is no longer preferred. The glyphs are being updated to reflect modern preferences. The old glyphs are shown below on the left; the new, preferred forms on the right:
Old glyphs and new, preferred glyphs

2008-May-30 U+1680 OGHAM SPACE MARK is displayed differently depending on the design of an Ogham font. Ogham fonts with stemlines (the norm) show U+1680 as a visible stemline; Ogham fonts without stemlines show it as a blank. The representative glyph in the standard is being updated to reflect this variability and for consistency with the representative glyphs used for other whitespace characters in the standard. The old representative glyph is shown on the left, and the updated glyph on the right.
Old glyphUpdated glyph
2008-May-27 In the XML representation of the UCD, Version 5.1.0, some attributes for the character U+A788 MODIFIER LETTER LOW CIRCUMFLEX ACCENT are incorrect. The gc attribute should be "Lm" rather than "Sk"; the Alpha, IDS, XIDS, IDC and XIDC attributes should be 'Y" rather than "N", and the WB and SB attributes should be "LE" rather than "XX".
2008-May-27 In the XML representation of the UCD, Version 5.1.0, the characters U+0000..U+001F and U+007F..U+009F have the incorrect value for the na attribute. It should be the empty string, rather than the string "<control>".
2008-May-22

In certain locations in the text of the standard, there are erroneous statements implying that use of a U+200B ZERO WIDTH SPACE (ZWSP) character indicates a "word break", when what are in question are actually line break opportunities. The text should be corrected to read as noted below.

In Section 16.2, Layout Controls, on page 535 of The Unicode Standard, Version 5.0, the following text:

Zero Width Space. The U+200B ZERO WIDTH SPACE indicates a word boundary, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent word breaks, such as Thai, Khmer, and Japanese.

should be replaced by this text:

Zero Width Space. The U+200B ZERO WIDTH SPACE indicates a line break opportunity, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent line break opportunities, such as Thai, Khmer, and Japanese.

In Section 11.3, Myanmar, on p. 381 of The Unicode Standard, Version 5.0, the following text:

Spacing. Myanmar does not use any whitespace between words. If word boundary indications are desired—for example, for the use of automatic line layout algorithms—the character U+200B ZERO WIDTH SPACE should be used to place invisible marks for such breaks.

should be replaced by this text:

Spacing. Myanmar does not use any whitespace between words. If explicit line break opportunities are desired—for example, for the use of automatic line layout algorithms—the character U+200B ZERO WIDTH SPACE should be used to place invisible marks for such breaks.

Note that as for other textual errata for the text of Unicode Version 5.0, these also apply to the text of Unicode Version 5.1.

2008-May-7 In UAX #31, "Unicode Identifier and Pattern Syntax" (Version 5.1.0), there is a typo in the description for (X)ID_Start in Table 2, Lexical Classes for Identifiers. "letter numbers (Lu)" should be corrected to read "letter numbers (Nl)".
2008-April-29 In UAX #29, "Unicode Text Segmentation" (Version 5.1.0), there is a typo in the definition of Prepend in Table 2, Grapheme_Cluster_Break Property Values. The correct definition is: "Logical_Order_Exception=True".
2008-April-28 In the Version 5.1 Unicode Character Database, the test cases in the test data file LineBreakTest.txt incorrectly indicate the presence of a break at the beginning of each line (with "÷"). These should be corrected to indicate no break at the beginning of each line (with "×"), to reflect the effect of LB2 "Never break at the start of text" from UAX #14, "Unicode Line Breaking Algorithm". Correspondingly, the documentation in LineBreakTest.html should have the rule 0.2 corrected to read: "sot ×".
2008-February-12 On p. 124 of The Unicode Standard, Version 5.0, there is an error in the Regular Expressions column for "More_Above", in the third row of Table 3-14, Context Specification for Casing. The corrected regular expression should be:

[^\p{ccc=230}\p{ccc=0}]* [\p{ccc=230}]

2007-June-26 The following text from the last paragraph of Section 15.4, Mathematical Symbols, on page 507 of The Unicode Standard, Version 5.0:

Using U+2278 or U+2279 with VS1 will request these variants explicitly, as will using U+2276 less-than or greater-than or U+2277 greater-than or less-than with U+20D2 combining long vertical line overlay. Unless fonts are created with the intention to add support for both forms (via VS1 for the upright forms),...

Should be replaced by this text:

Using U+2276 or U+2277 followed by U+20D2 COMBINING LONG VERTICAL LINE OVERLAY represents these upright variants explicitly. Except for those fonts created with the intention to add support for both forms (via combination of U+2276 or U+2277 with U+20D2 for the upright forms),...

2007-June-4

In Section 12.1, Han on p. 424 of The Unicode Standard, Version 5.0, the last paragraph states that U+FA70 to U+FAD9 are "included in the Unicode Standard to provide full round-trip compatibility with the ideographic repertoire of PKS 5700 parts 1, 2, and 3." However, the Korean standard listed is incorrect, and the text should be corrected to "... the ideographic repertoire of KPS 10721-2000."

2007-May-24 On p. 479 of The Unicode Standard, Version 5.0, the subheading for Linear B Ideographs lists the range as "U+10080--U+108FF". That should be corrected to "U+10080--U+100FF".
2007-January-11 There is an error in the entry for "Trailing Consonant" on page 1147 in the glossary of The Unicode Standard, Version 5.0. "Vowel_Jamo" should be "Trailing_Jamo" in definition (1), thus reading "(1) In Korean, a jamo character with the
Hangul_Syllable_Type property value Trailing_Jamo (in the range U+11A8..U+11F9)."
2007-January-5 There is an error in the sample code in section 5.17 on page 182 of The Unicode Standard, Version 5.0. The entry 0x2F in the second row of the rotate table should instead be 0x1F.
2007-January-4 On page 411 of The Unicode Standard, Version 5.0, Table 12-2 incorrectly states the extent of the CJK Unified Ideographs Extension A block. The correct range is U+3400 to U+4DBF. In particular, the Yijing Hexagram Symbols starting at U+4DC0 are not part of Extension A.
2007-January-2 Due to a printing error, the Unified Canadian Aboriginal Syllabics glyphs at U+1424, U+1426, and U+1487 are missing in the code charts and names list on pages 684 and 687-88 of The Unicode Standard, Version 5.0. These glyphs were correctly represented in the online charts and can be viewed at http://www.unicode.org/charts/PDF/U1400.pdf.
2007-January-2 The file UNIHAN/FullRSIndex.pdf on the Unicode 5.0 CD-ROM is missing a final page with the last half of the entry for 211 (tooth) and the complete entries for 212 (dragon), 213 (turtle), and 214 (flute). The missing page is available here as a PDF.
2006-December-21 Table 11-16 in The Unicode Standard, Version 5.0 shows "kyu" twice: once at the top of part on page 402 and once at the top of the part on page 403. The repetition is an error and the second instance should be removed.