Publications and Data

The Unicode Consortium publishes data and technical specifications which help people around the world to use their languages on computers and mobile devices. The freely-available Unicode Standard and associated specifications form the foundation for software internationalization in all major operating systems, search engines, applications, and the World Wide Web.

The technical information includes:

  • Technical annexes, standards, and reports
  • Data tables and repositories
  • Informational FAQs

Technical Annexes, Standards, and Reports

The Unicode Standard is comprised of the core specification, standard annexes and algorithms, and associated data and behavior specifications. The versions of the Unicode Standard can be accessed at About Versions of the Unicode Standard.

There are three types of technical reports:

  • Unicode Standard Annexes are an integral part of the Unicode Standard
  • Unicode Technical Standards specify related normative standards
  • Unicode Technical Reports are informational

For more on these publications, see About Unicode Technical Reports and Technical Reports.

Data Tables and Repositories

Unicode algorithms and specifications often require machine-readable data. These data specifications are available on the Unicode website at Online Data.

Important data repositories include the Unicode Common Locale Data Repository (CLDR), with cultural formatting information covering many languages and territories, and the Unicode Unified Han (Unihan) Character Database, which contains voluminous information on the Unicode repertoire of CJK Han characters.

Informational FAQs

For detailed information and links to Unicode specifications, see the Specifications FAQ. For an overview of the many aspects of character encoding support and localization technologies addressed by the Unicode Consortium, see the Frequently Asked Questions (FAQ) section of this website.