[Unicode]  Unicode Character Database Home | Site Map | Search

About the Unicode Character Database

The Unicode Character Database (UCD) consists of a number of data files listing Unicode character properties and related data. It also includes data files containing test data for conformance to several important Unicode algorithms. Full documentation for the UCD can be found in Unicode Standard Annex #44, Unicode Character Database.

Latest Version of the Unicode Character Database

All files for the most up-to-date version of the Unicode Character Database can be found at: http://www.unicode.org/Public/UCD/latest/.

Files in the UCD/latest/ subdirectories are unversioned: they do not contain any version indicator in their file name. However, most of the data files contain a file header in a standard format, which indicates the Unicode version and the date of last revision of that file.

The latest version of the Unicode Standard, which corresponds to the latest version of the UCD, can be found at: http://www.unicode.org/versions/latest/.

Specific Versions of the UCD

Each specific version of the UCD is available for archival access in a versioned directory. For example, the UCD for Unicode 5.2 specifically is available at:

The UCD for Unicode 5.0 is available at:
http://www.unicode.org/Public/5.0.0/ and so on for each earlier version of the standard.

For access to versions of the UCD earlier than Version 4.1, the structure of the archival directories differed somewhat. For full details, see Unicode Standard Annex #44, Unicode Character Database.

A comprehensive list of the exact data files that make up a given version of the UCD can be found in the component lists at Enumerated Versions of the Unicode Standard.

The UCD in XML

The contents of each version of the UCD is also available in XML format. The XML files are in zipped format and are stored in a subdirectory for each version. For example, the XML version of UCD Version 5.2 can be found in:

Full documentation about the XML versions of the UCD can be found in Unicode Standard Annext #42, Unicode Character Database in XML.

BETA Versions

During periods when a preliminary (beta) version of the standard is being released for public comment Public Beta files are available. For more information about any ongoing public betas see the BETA notice as well as Public Review Issues.

FTP Access

All files and directories in the Unicode Character Database are accessible both via HTTP and FTP. For FTP access substitute "ftp:" for "http:" in any of the links given above.
For example, to access the contents of http://www.unicode.org/Public/UCD/latest/ by FTP, use the following modified URL: ftp://www.unicode.org/Public/UCD/latest/