[Unicode]  Frequently Asked Questions Home | Site Map | Search

Specifications

Q: How can I find out whether a particular issue is covered by a specification published by the Consortium. And where do I look it up?

A. The Unicode Standard and related standards contain a number of specifications or guidelines for dealing with different programming tasks. Sometimes it's hard to find these as they are not all provided as specific, dedicated documents.

The following table lists subject areas for which the Unicode Consortium provides specifications, with a location and brief description what each specification covers.

General

Character Properties: common properties such as Name, Alphabetic, Letter, White-Space, General Category, Default-Ignorable, plus those used in other specifications

Ch 4

Character Properties for CJK Ideographs: property information specific to CJK ideographs and character properties

UAX 38

Unicode Character Database: general documentation about the UCD

UAX 44

UCD in XML: description of the XML representation of the UCD

UAX 42

Case Operations: conversion/detection of Upper/Lower/Titlecase, case folding, case matching. See also 4.2 Case.

§ 3.13

Characters with Unusual Properties: characters that implementers need to pay special attention to

§ 4.12

Use of Characters in Markup Contexts: guidelines for XML and other markup languages

UTR 20

Script Names: usage model for determining text runs in a given script

UAX 24

Use of Characters in Mathematical Contexts: guidelines for mathematical usage

UTR 25

Unicode Named Character Sequences: specifies the syntax for named character sequences

UAX 34

Encodings

Unicode Encoding Forms: UTF-8, UTF-16, UTF-32 conversion and validation

§ 3.9

Unicode Encoding Schemes: UTF-8, UTF-16 (BE/LE), UTF-32 (BE/LE) conversion and validation

§ 3.10

Binary Order: UTF-8 order vs. UTF-16 order

§ 5.17

Character Mapping Markup Language: mapping Unicode to and from legacy code pages

UTS 22

A Standard Compression Scheme for Unicode: how to compress Unicode to about the same size as legacy

UTS 6

UTF-EBCDIC: encapsulating Unicode on EBCDIC systems

UTR 16

Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8): a compatibility 8-bit encoding scheme

UTR 26

Ideographic Variation Database: repository of variation sequences for specified collections of Han glyphs

UTS 37

Comparison (Normalization, Collation)

Canonical Equivalence: when character sequences are equivalent; canonical ordering

§ 3.11

Unicode Normalization Forms: how to normalize text for comparison

UAX 15, § 3.11

Unicode Collation Algorithm: the default mechanism for comparing, searching, and matching Unicode text

UTS 10

Parsing

Hangul Syllables: boundaries, parsing, (de/)composition, names

§ 3.12

Decimal Numbers: conversion and validation

§ 5.5

Unicode Regular Expression Guidelines: the features required in supporting regular expressions with Unicode

UTS 18

Identifier and Pattern Syntax: how to parse identifiers

UAX 31

Language Information in Plain Text, also 16.9 Deprecrated Tag Characters

§ 5.10

Variation Selectors: usage, validation

§ 16.4

Ideographic Description Sequences: use, validation

§ 12.2

Segmentation

Newline Guidelines: how to handle newline characters

§ 5.8

Line Breaking Algorithm: the default way to determine where to linewrap

UAX 14

Text Segmentation: the default way to break text into user characters, words, and sentences

UAX 29

Rendering

The Bidirectional Algorithm: required for display of Arabic and Hebrew text

UAX 9

East Asian Width: the default determination of character width in East Asian contexts

UAX 11

Minimal shaping requirements for Arabic, Devanagari, Tamil, etc.

Ch 8-10

Locale Data

Locale Data Mark-up Language (LDML): used for Interchange of locale data used for internationalization

UTS 35

Common Locale Data Repository (CLDR): a repository of LDML data for hundreds of locales

CLDR

Identifiers and Security

Identifier and Pattern Syntax: security issues for identifiers

UAX 31

Unicode Security Considerations: guidelines for recognizing Unicode security problems and dealing with them

UTR 36

Unicode Security Mechanisms: useful tools for detecting spoofs

UTS 39

Unicode IDNA Compatibility Processing: mapping for IDNA2008, and compatibility processing for IDNA2003

UTS 46

Q. Are all of these normative?

A. No. Some are normative and others are informative. For sections of The Unicode Standard, the material in Chapter 3, Conformance, and most of Chapter 4, Character Properties, are normative, while material in other sections is generally informative. The Unicode Standard Annexes (UAX) are formally a part of the Unicode Standard, and most of the material in them is normative, unless otherwise indicated in the annex itself. For Unicode Technical Standards (UTS), the specifications are normative parts of those independent standards. Unicode Technical Reports (UTR) contain informative material. For more information see About Unicode Technical Reports.