|
|
Updates and Errata
The following is a list of errata noted for
The Unicode
Standard, Version 5.1, its code charts, annexes and
the Unicode Character Database. It is periodically updated to include
corrections to typographic errors and new clarifications of the
text. This list also includes errata noted for the text of the book,
The Unicode Standard, Version 5.0, and not yet corrected in consolidated text.
Formal corrigenda notices for the Unicode Standard can be found at
Corrigenda to the Unicode Standard.
Corrigenda for the Unicode CLDR
are posted at
Unicode CLDR Corrigenda, and errata notices for UTS #35: Locale Data Markup Language (LDML)
can be found at
Errata for
UTS #35 LDML.
Reports of errors in published documents, such as the Unicode
Standard itself or Unicode Technical Reports, may be filed using the
Unicode Consortium's
online form. If confirmed, and depending on the nature of the
reported error, an erratum may be posted on this page, to be fixed
in subsequent editions of the Standard.
| Date |
Summary |
| 2008-Aug-21 |
In
UAX #31, "Unicode Identifier and Pattern Syntax" (Version 5.1.0), there is a mistake in the first bullet of A2 in Section 2.3, Layout and Format Control Characters.
The text ", followed by a Letter" should be deleted from that bullet, to make the textual description consistent with the regular expression description in the following bullet. |
| 2008-Jun-06 |
In the code charts for Unicode Versions 5.1 and earlier,
the representative glyphs for the case pairs of two
Cyrillic letters for Abkhaz, Abkhazian Ha (04A8/04A9)
and Abkhazian Che (04BE/04BF), are shown in an old style
that is no longer preferred. The glyphs are being updated
to reflect modern preferences. The old glyphs are
shown below on the left; the new, preferred forms on the
right:
 |
| 2008-May-30 |
U+1680 OGHAM SPACE MARK is displayed differently depending on
the design of an Ogham font. Ogham fonts with stemlines (the norm)
show U+1680 as a visible stemline; Ogham fonts without stemlines
show it as a blank. The representative glyph in the standard is
being
updated to reflect this variability and for consistency with
the representative glyphs used for
other whitespace characters in the standard. The old representative
glyph is shown on the left, and the updated glyph on the right.
  |
| 2008-May-27 |
In the XML representation of the UCD, Version 5.1.0, some attributes for the character U+A788 MODIFIER LETTER LOW CIRCUMFLEX ACCENT are incorrect. The gc attribute should be
"Lm" rather than "Sk"; the Alpha, IDS, XIDS, IDC and XIDC attributes should be
'Y" rather than "N", and the WB and SB attributes should be "LE" rather than
"XX". |
| 2008-May-27 |
In the XML representation of the UCD, Version 5.1.0, the
characters U+0000..U+001F and U+007F..U+009F have the incorrect value
for the na attribute. It should be the empty string, rather than the
string "<control>". |
| 2008-May-22 |
In certain locations in the text of the standard, there are erroneous statements implying that use of a U+200B ZERO WIDTH SPACE (ZWSP) character indicates a "word break", when what are in question are actually line break opportunities.
The text should be corrected to read as noted below.
In Section 16.2, Layout Controls, on page 535 of The Unicode Standard, Version 5.0, the following text:
Zero Width Space. The U+200B ZERO WIDTH SPACE indicates a word boundary, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent word breaks, such as Thai, Khmer, and Japanese.
should be replaced by this text:
Zero Width Space. The U+200B ZERO WIDTH SPACE indicates a line break opportunity, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent line break opportunities, such as Thai, Khmer, and Japanese.
In Section 11.3, Myanmar, on p. 381 of The Unicode Standard, Version 5.0, the following text:
Spacing. Myanmar does not use any whitespace between words. If word boundary indications are desired—for example, for the use of automatic line layout algorithms—the character U+200B ZERO WIDTH SPACE should be used to place invisible marks for such breaks.
should be replaced by this text:
Spacing. Myanmar does not use any whitespace between words. If explicit line break opportunities are desired—for example, for the use of automatic line layout algorithms—the character U+200B ZERO WIDTH SPACE should be used to place invisible marks for such breaks.
Note that as for other textual errata for the text of Unicode Version 5.0, these also apply to the text of Unicode Version 5.1.
|
| 2008-May-7 |
In UAX #31, "Unicode Identifier and Pattern Syntax" (Version 5.1.0), there is a typo in the description for (X)ID_Start in Table 2, Lexical Classes for Identifiers. "letter numbers (Lu)" should be corrected to read "letter numbers (Nl)". |
| 2008-April-29 |
In UAX #29, "Unicode Text Segmentation" (Version 5.1.0), there is a typo in the definition of Prepend
in Table 2, Grapheme_Cluster_Break Property Values. The correct definition is: "Logical_Order_Exception=True". |
| 2008-April-28 |
In the Version 5.1 Unicode Character Database, the test cases in the test data
file LineBreakTest.txt incorrectly indicate the presence of a break at the
beginning of each line (with "÷"). These should be corrected to indicate no
break at the beginning of each line (with "×"), to reflect the effect of LB2
"Never break at the start of text" from
UAX #14, "Unicode
Line Breaking Algorithm".
Correspondingly, the documentation in LineBreakTest.html should have the rule
0.2 corrected to read: "sot ×".
|
| 2008-February-12 |
On p. 124 of The Unicode Standard, Version 5.0, there is an error in the Regular Expressions column for
"More_Above", in the third row of
Table 3-14, Context Specification for Casing.
The corrected regular expression should be:
[^\p{ccc=230}\p{ccc=0}]* [\p{ccc=230}]
|
| 2007-June-26 |
The following text from the last paragraph of
Section 15.4, Mathematical Symbols, on page 507 of The Unicode
Standard, Version 5.0: Using U+2278 or U+2279 with VS1 will
request these variants explicitly, as will using U+2276 less-than
or greater-than or U+2277 greater-than or less-than with U+20D2
combining long vertical line overlay. Unless fonts are created
with the intention to add support for both forms (via VS1 for the
upright forms),...
Should be replaced by this text:
Using U+2276 or U+2277 followed by U+20D2 COMBINING LONG
VERTICAL LINE OVERLAY represents these upright variants
explicitly. Except for those fonts created with the intention to
add support for both forms (via combination of U+2276 or
U+2277 with U+20D2 for the upright forms),... |
| 2007-June-4 |
In Section 12.1, Han on p. 424 of The Unicode Standard, Version 5.0, the last paragraph states that U+FA70 to U+FAD9 are "included in the Unicode Standard to provide full round-trip compatibility with the ideographic repertoire of PKS 5700 parts 1, 2, and 3." However, the Korean standard listed is incorrect, and the text should be corrected to "... the ideographic repertoire of KPS 10721-2000."
|
| 2007-May-24 |
On p. 479 of The Unicode Standard, Version 5.0, the
subheading for Linear B Ideographs lists the range as
"U+10080--U+108FF". That should be corrected to
"U+10080--U+100FF". |
| 2007-January-11 |
There is an error in the entry for "Trailing Consonant" on page
1147 in the glossary of The Unicode Standard, Version 5.0.
"Vowel_Jamo" should be "Trailing_Jamo" in definition (1), thus reading "(1) In Korean, a jamo character with the Hangul_Syllable_Type property value
Trailing_Jamo (in the range U+11A8..U+11F9)." |
| 2007-January-5 |
There is an error in the sample code in section 5.17 on page
182 of The Unicode Standard, Version 5.0. The entry 0x2F in the second row of the rotate table should instead be 0x1F. |
| 2007-January-4 |
On page 411 of The Unicode Standard, Version 5.0, Table 12-2 incorrectly states the extent of the CJK Unified Ideographs Extension A block. The correct range is U+3400 to U+4DBF. In particular, the Yijing Hexagram
Symbols starting at U+4DC0 are not part of Extension A. |
| 2007-January-2 |
Due to a printing error, the Unified Canadian Aboriginal
Syllabics glyphs at U+1424, U+1426, and U+1487 are missing in the code charts
and names list on pages 684 and 687-88 of The Unicode Standard,
Version 5.0.
These glyphs were correctly represented in the online charts and can be viewed at
http://www.unicode.org/charts/PDF/U1400.pdf. |
| 2007-January-2 |
The file UNIHAN/FullRSIndex.pdf on the Unicode 5.0 CD-ROM is missing a final page
with the last half of the entry for 211 (tooth) and the complete entries for 212 (dragon), 213 (turtle), and 214 (flute). The
missing page is available
here
as a PDF. |
| 2006-December-21 |
Table 11-16 in The Unicode Standard, Version 5.0 shows "kyu" twice:
once at the top of part on page 402 and once at the top of the
part on page 403. The repetition is an error and the second
instance should be removed. |
|
|