[Unicode] The Standard Home | Site Map | Search
 

New in Unicode 3.0

Unicode 3.0 is the major version of the Unicode Standard, published in February 2000 as The Unicode Standard, Version 3.0This version is amended by a minor versions, Unicode 3.1 and Unicode 3.2. For more information, see Versions of the Unicode Standard.

For information on the precise contents of Unicode 3.0.0, see the specifications for Unicode 3.0.0.

Overview

The Unicode Standard, Version 3.0 contains descriptions and properties for many new characters. It is synchronized with ISO/IEC 10646-1 second edition, and includes a number of new characters, summarized in the following table:

Unicode 3.0 Summary

Category V 2.1 V 3.0
Alphabetics, Symbols

6511

10236

CJK Ideographs

21204

27786

Hangul Syllables

11172

11172

Total assigned characters

38887

49194

Private Use

6400

6400

Surrogates

2048

2048

Controls

65

65

Not Characters

2

2

Total assigned 16-bit code values

47402

57709

Unassigned 16-bit code values

18134

7827

Besides adding characters to existing blocks, Unicode 3.0 adds a number of new blocks, listed below, and including the number of codepoints allocated to each block. For a list of all the blocks in Unicode 3.0, see http://www.unicode.org/Public/UNIDATA/Blocks.txt

New Blocks

Alloc. Block Name
80 Syriac
192 Thaana
128 Sinhala
160 Myanmar
384 Ethiopic
96 Cherokee
640 Unified Canadian Aboriginal Syllabics
32 Ogham
96 Runic
128 Khmer
176 Mongolian
256 Braille Patterns
128 CJK Radicals Supplement
224 Kangxi Radicals
16 Ideographic Description Characters
32 Bopomofo Extended
6,582 CJK Unified Ideographs Extension A
1,168 Yi Syllables
64 Yi Radicals

Unicode 3.0 also includes enhanced implementation guidelines, and has been reorganized to describe related scripts within separate chapters. In addition to new characters, there are significant clarifications or modifications to character semantics from Unicode 2.0 to Unicode 3.0.

The vast majority of implementations of earlier versions will be conformant to Unicode 3.0.0 once the character properties for their supported characters are updated to Version 3.0.0 of the Unicode Character Database.

The most significant additions to the standard include the following:

  • Transformation Formats. The precise definitions of the common Unicode Transformation Formats are provided, including UTF-8, UTF-16, UTF-16BE, and UTF-16LE. The relations between abstract characters, code points (scalar values) and code units (8, 16 or 32 bit) are clarified. See also Draft Technical Reports.
  • Bidirectional properties. Bidirectional properties are now more consistent with the general category property, and new bidirectional properties were created. See UTR #09: The Bidirectional Algorithm.
  • Case. Case properties have been extended for those situations where there is a mapping to multiple characters and where case is locale dependent. See also Draft Technical Reports.
  • Combining classes. These were updated significantly to resolve problems of normalization and decomposition for Indic scripts in particular.
  • Decomposition and Composition. Unicode character decompositions have been significantly updated to fix errors in the original assignments, to allow correct collation weighting, and to make decompositions consistent for normalization. Certain characters are excluded from composition, and the precise algorithm for composition is provided. See UTR #15: Unicode Normalization Forms.
  • General Category. A series of general category changes were made to assist the convergence of the Unicode definition of identifier with ISO TR 10176.
  • Newlines. Line handling characteristics have been documented more fully for Unicode environments. See UTR #13: Unicode Newline Guidelines.
  • Quotation Marks. Two new punctuation categories, Pi and Pf, were created for initial and final quotes with properties that vary by language.
  • Linebreak properties. Linebreaking properties (normative and informative) are added to the standard to support consistent linebreaking behavior over all Unicode characters. See UTR #14: Line Breaking Properties.
  • East-Asian width properties. Properties for supporting correct choice of full-width vs. half-width glyphs in an East-Asian context are provided. See UTR #11: East Asian Character Width.
  • Specific Characters
    • Byte order mark. The use of the byte order mark with transformation formats is clarified.
    • Line and paragraph separators. Use of line and paragraph separators is clarified.
    • Capital letters with iota adscript. The representative glyphs, semantics, case mappings and decompositions have been revised to make their handling more consistent.
    • Eyelash Ra. Consonant RA rules have been updated and expanded.
    • Figure space. U+2007 FIGURE SPACE is no longer treated like a numeric separator for purposes of bidirectional layout.
    • Layout controls. The description of layout controls was enhanced to include the behavior of U+00A0 NO-BREAK SPACE, U+00AD SOFT HYPHEN, and zero-width spaces.
    • Tilde. The use of U+007E TILDE as a spacing clone of combining tilde and as a regular character is described more completely.

Conformance Changes

Conformance clauses, definitions, and explanatory text were added for handling Unicode Transformation Formats. The Unicode Bidirectional Behavior algorithm rules were clarified and expanded, and new bidirectional character properties were documented. Other normative character property values were changed; see the Unicode character database file for more information.

Approved Unicode Technical Reports

The following technical reports are approved and considered part of the Unicode Standard, Version 3.0. These reports may contain either normative or informative material, or both. Any reference to version 3.0 of the standard automatically includes these technical reports.

The following technical reports are also approved. Although normative in stating requirements for implementations claiming conformance to them, they are not considered part of Unicode 3.0. If they are cited, they must be separately referenced; see Citations and References.

Draft Unicode Technical Reports

Additional draft and proposed draft technical reports can be found on Technical Reports. While these reports are not final and not considered part of Unicode 3.0, they contain information that may be useful for implementation. If they are cited, they must be separately referenced; see Citations and References.

Additional technical reports may be added over time.