[Unicode]  The Unicode Standard Home | Site Map | Search
 

About Versions of the Unicode Standard

This page describes:

For specific details regarding individual numbered versions of the Unicode Standard, see the Archive of Unicode Versions. For details regarding dates of past releases and publications, see History of Release and Publication Dates.

Version Numbering

Version numbers for the Unicode Standard consist of three fields, denoting the major version, the minor version, and the update version, respectively. For example, “Unicode 3.1.1” indicates major version 3 of the Unicode Standard, minor version 1 of Unicode 3, and update version 1 of minor version Unicode 3.1.

Formally, each new version of the Unicode Standard supersedes all earlier versions. However, because of the differences in the ways major, minor, and update versions are documented, update versions do not obsolete all of the documentation of the immediately prior versions of the standard.

The differences between major, minor, and update versions are as follows:

Major and Minor Versions

Major and minor versions have significant additions to the standard, including, but not limited to, additions to the repertoire of encoded characters. Both are published as updated text of the standard, together with associated updates to Unicode Standard Annexes and the Unicode Character Database. Such versions consolidate all errata and corrigenda and supersede any prior documentation for major, minor, or update versions.

A major version typically is of more importance to implementations; however, even update versions may be important to particular companies or other organizations. Major and minor versions are often synchronization points with related standards, such as with ISO/IEC 10646.

Prior to Version 5.2, minor versions of the standard were published as online amendments expressed as textual changes to the previous version, rather than as fully consolidated new editions of the text.

Update Version

An update version represents relatively small changes to the standard, typically updates to the data files of the Unicode Character Database. An update version never involves any additions to the character repertoire. These versions are published as modifications to the data files, and, on occasion, include documentation of small updates for selected errata or corrigenda. Formally, each new version of the Unicode Standard supersedes all earlier versions. However, because of the differences in the way versions are documented, update versions generally do not obsolete the documentation of the immediately prior version of the standard.

Starting with Unicode 3.0.1, update versions are published as stable version pages online. Prior to that version, update versions were simply documented with the list of relevant data file changes to the Unicode Character Database. For historical reasons, update version numbers were not always consecutive prior to Unicode 3.0.

Schedule of Releases

Starting with Unicode 7.0, the Unicode Technical Committee has decided to follow a more predictable release schedule. A new major version of the standard will be released regularly in the middle of each year. Thus, Unicode 7.0 was released in June 2014, Unicode 8.0 will be released in June or July 2015, and so on. Minor and update versions will be avoided, unless necessary to address particular issues on a timely basis.

Documents and Version Numbering

The documents associated with the major, minor, and update versions are called the major reference, minor reference, and update reference, respectively. For example, consider Unicode Version 3.1.1. The major reference for that version is The Unicode Standard, Version 3.0 (ISBN 0-201-61633-5). The minor reference is Unicode Standard Annex #27, The Unicode Standard, Version 3.1. The update reference is Unicode Version 3.1.1. The exact list of contributory files, UAXs and the Unicode Character Database, can be found on the page Components for Version 3.1.1.

Contributory Data Files

The Unicode Consortium archives each version of the standard. Each archived version consists of the set of versioned contributory data files. For earlier versions of the standard, these include the material from the major reference, unless superseded by files in the update version. The Unicode Consortium maintains online access to archival copies of all contributory files available in electronic form. For the earliest versions, some material was only published in paper form. The Consortium maintains private archive records of these. In addition, they are available in many libraries.

Certain files change with every version of the Unicode Standard, and either have corresponding version numbers, such as UnicodeData-3.1.0.txt, or, for more recent versions, are located in versioned directories, such as 6.0.0/UnicodeData.txt. Other files have independent version numbers, such as tr7-4.html for the fourth version of Unicode Technical Report #7. The latest version of each file will also be copied under the corresponding file name with no version, such as UnicodeData.txt or tr7.html.

The latest versions of all of the Unicode Character Database files are kept in http://www.unicode.org/Public/UCD/latest/ and mirrored at ftp://ftp.unicode.org/Public/UCD/latest/. These files will have no version number attached to them, so that a link to a file in that directory will always point to the latest version.

The component specifications linked from the Archive of Unicode Versions contain precise lists of the contents of each version of the standard. Each specification for version 3.0 or later also includes the recommended citation format for that version, such as:

The Unicode Consortium. The Unicode Standard, Version 7.0.0, (Mountain View, CA: The Unicode Consortium, 2014. ISBN 978-1-936213-09-2)
http://www.unicode.org/versions/Unicode7.0.0/

The file DerivedAge.txt contains a list showing when various code points were designated in Unicode. This can be useful in determining the version in which a character first appears.

For versions 5.2 and later, a summary of the modifications in the Unicode Character Database can be found in Unicode Standard Annex #44, Unicode Character Database. That annex also specifies which character properties in the UCD are normative, informative, or contributory. For earlier versions, such information is found instead in the documentation file, UCD.html.

The component specifications of the Archive of Unicode Versions use the following table to indicate the change status of data files between versions.

Key
N New in this release
D Data change (possibly also format/text change)
F Data format change (possibly also text change)
T Text annotation change
- Unchanged

Zipped data files are also available. Those zip files are complete for versions 4.1 and later. For versions earlier than 4.1, the zip files only include files that were changed between versions.

Errata and Corrigenda

From time to time it may be necessary to publish errata or corrigenda to the Unicode Standard. Such errata and corrigenda will be published on the Unicode Web site. To report errors in the standard, please use the contact form

Errata. Errata correct errors in the text or other informative material, such as the representative glyphs in the code charts. See Updates and Errata for the list of known current errata for the standard. Whenever a new major version of the standard is published, all corrections for errata up to that point are incorporated into the text.

Corrigenda. Occasionally errors may be important enough that a corrigendum is issued prior to the next version of the Unicode Standard. Such a corrigendum does not change the contents of the previous version. Instead, it provides a mechanism for an implementation, protocol, or other standard to cite the previous version of the Unicode Standard with the corrigendum applied. If a citation does not specifically mention the corrigendum, the corrigendum does not apply. See Corrigenda for more information about corrigenda to the standard.

Online versus Printed Editions

Major versions of the Unicode Standard from Version 3.0 to Version 5.0 were published both as printed books and as electronic editions on this Web site. The electronic edition contains all of the chapters of the book (in pdf format). The content of both editions is intended to be identical, but in case of any inadvertent errors in production of the electronic edition, the printed edition should be taken as authoritative for those versions.

Extensible Character Repertoire

For most character encodings, the character repertoire is fixed (and often small). Once the repertoire is decided upon, it is never changed. Addition of a new abstract character to a given repertoire creates a new repertoire, which will be treated either as an update of the existing character encoding or as a completely new character encoding.

For the Unicode Standard, by contrast, the repertoire is inherently open. Because Unicode is a universal encoding, any abstract character that could ever be encoded is a potential candidate to be encoded, regardless of whether the character is currently known.

Each new version of the Unicode Standard supersedes the previous one, but implementations—and, more significantly, data—are not updated instantly. In general, major and minor version changes include new characters, which do not create particular problems with old data. The Unicode Technical Committee will neither remove nor move characters. Characters may be deprecated, but this does not remove them from the standard or from existing data. The code point for a deprecated character will never be reassigned to a different character, but the use of a deprecated character is strongly discouraged. Generally these rules make the encoded characters of a new version backward-compatible with previous versions.

Implementations should be prepared to be forward-compatible with respect to Unicode versions. That is, they should accept text that may be expressed in future versions of this standard, recognizing that new characters may be assigned in those versions. Thus they should handle incoming unassigned code points as they do unsupported characters. (See Section 5.3, Unknown and Missing Characters.)

A version change may also involve changes to the properties of existing characters. When this situation occurs, modifications are made to the Unicode Character Database and a new update version is issued for the standard. Changes to the data files may alter program behavior that depends on them. However, such changes to properties and to data files are never made lightly. They are made only after careful deliberation by the Unicode Technical Committee has determined that there is an error, inconsistency, or other serious problem in the property assignments.

Citations

Since Unicode is an open standard, it is important not to over-specify the version number. Wherever the precise behavior of all Unicode characters needs to be cited, the full three-field version number should be used, as below in example (1).

  1. The Unicode Standard, Version 3.1.1
  2. The Unicode Standard, Version 3.1
  3. The Unicode Standard, Version 3.0 or later
  4. The Unicode Standard

Where the precise character repertoire is significant, but the precise character properties are not at issue, then the third field can be omitted, as in  example (2). Where some basic level of content is all that is important, phrasing such as in example (3) can be used. Where the important information is simply the overall architecture and semantics of the Unicode Standard, the version can be omitted entirely, as in example (4).

Particular definitions or conformance clauses can also be cited, such as:

Conformance clause C3 of The Unicode Standard, Version 3.1

When citing the Unicode Character Database separately, use the same format for version numbers.

The Unicode Character Database, Version 3.1.1

The versioning for Unicode Technical Reports and distinctions among the different categories of report are explained on Technical Reports. The full form of a citation takes one of the following forms:

  • Unicode Standard Annex #9: The Bidirectional Algorithm, Version 3.0.1
  • Unicode Technical Standard #6: A Standard Compression Scheme for Unicode, Version 3.1
  • UTR #17: Character Encoding Model, Version 3.1

As above, the revision number can be omitted where not necessary. Citations can be abbreviated using the formats below, although the first citation should be spelled out for clarity. In particular, the acronyms UCD, UAX, UTS, and UTR should be spelled out the first time. The title of a UTR should be supplied with the first reference.

  • Unicode 3.1.1 (instead of The Unicode Standard, Version 3.1.1)
  • Clause C3 of Unicode 3.1.1
  • UCD 3.1.1
  • UTS #10: Unicode Collation Algorithm
  • UTS #10

When claiming conformance, the precise version should be used, such as:

  • "This product conforms to The Unicode Standard, Version 7.0.0".
  • "This product conforms to UTS #10: Unicode Collation Algorithm, Version 7.0.0".

Reference Examples

The format for references to The Unicode Standard, Unicode Standard Annexes, and other Unicode Technical Reports is illustrated by the following examples. For the actual citations for references to each version of the standard, see the Archive of Unicode Versions.

The Unicode Standard, Latest Version

Versioned

The Unicode Consortium. The Unicode Standard, Version 7.0.0, (Mountain View, CA: The Unicode Consortium, 2014. ISBN 978-1-936213-09-2)
http://www.unicode.org/versions/Unicode7.0.0/

Versionless

The Unicode Consortium. The Unicode Standard.
http://www.unicode.org/versions/latest/

The Unicode Standard, Earlier Versions

Version 6.3.0

The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5)
http://www.unicode.org/versions/Unicode6.3.0/

Version 6.2.0

The Unicode Consortium. The Unicode Standard, Version 6.2.0, (Mountain View, CA: The Unicode Consortium, 2012. ISBN 978-1-936213-07-8)
http://www.unicode.org/versions/Unicode6.2.0/

Version 6.1.0

The Unicode Consortium. The Unicode Standard, Version 6.1.0, (Mountain View, CA: The Unicode Consortium, 2012. ISBN 978-1-936213-02-3)
http://www.unicode.org/versions/Unicode6.1.0/

Version 6.0.0

The Unicode Consortium. The Unicode Standard, Version 6.0.0, (Mountain View, CA: The Unicode Consortium, 2011. ISBN 978-1-936213-01-6)
http://www.unicode.org/versions/Unicode6.0.0/

Version 5.2.0

The Unicode Consortium. The Unicode Standard, Version 5.2.0 (Mountain View, CA: The Unicode Consortium, 2009. ISBN 978-1-936213-00-9)
http://www.unicode.org/versions/Unicode5.2.0/

Version 5.1.0

The Unicode Consortium. The Unicode Standard, Version 5.1.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA, Addison-Wesley, 2007. ISBN 0-321-48091-0), as amended by Unicode 5.1.0
http://www.unicode.org/versions/Unicode5.1.0/

Version 5.0.0

The Unicode Consortium. The Unicode Standard, Version 5.0.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA, Addison-Wesley, 2007. ISBN 0-321-48091-0)

Version 4.0.1

The Unicode Consortium. The Unicode Standard, Version 4.0.1, defined by: The Unicode Standard, Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), as amended by Unicode 4.0.1
http://www.unicode.org/versions/Unicode4.0.1/

Version 4.0.0 with Corrigendum

The Unicode Consortium. The Unicode Standard, Version 4.0.0, defined by: The Unicode Standard, Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1) and Corrigendum #5: Normalization Idempotency
http://www.unicode.org/versions/corrigendum5.html

Unicode Standard Annexes

Versioned

Unicode Standard Annex #15, "Unicode Normalization Forms," edited by Mark Davis and Ken Whistler, an integral part of The Unicode Standard. Version 7.0.0. 2014-06-05. (http://www.unicode.org/reports/tr15/tr15-41.html)
Latest Version: http://www.unicode.org/reports/tr15/

Versionless

Unicode Standard Annex #15, "Unicode Normalization Forms," edited by Mark Davis and Ken Whistler. An integral part of The Unicode Standard.
(http://www.unicode.org/reports/tr15/)

Unicode Technical Standards

Versioned

Unicode Technical Standard #10,"Unicode Collation Algorithm," edited by Mark Davis, Ken Whistler and Markus Scherer. Version 7.0.0. 2014-06-09. (http://www.unicode.org/reports/tr10/tr10-30.html)
Latest Version: http://www.unicode.org/reports/tr10/

Versionless

Unicode Technical Standard #10,"Unicode Collation Algorithm," edited by Mark Davis and Ken Whistler. (http://www.unicode.org/reports/tr10/)

Other Unicode Technical Reports

Versioned

Unicode Technical Report #20, "Unicode in XML and other Markup Languages," by Martin Dürst and Asmus Freytag. 2007-05-16. Also published as a W3C Note. (http://www.unicode.org/reports/tr20/tr20-8.html)
Latest Version: http://www.unicode.org/reports/tr20/

Versionless

Unicode Technical Report #20, "Unicode in XML and other Markup Languages," by Martin Dürst and Asmus Freytag. Also published as a W3C Note. (http://www.unicode.org/reports/tr20/)


Access to Copyright and terms of use