A. The Unicode Standard and related standards contain a
number of specifications or guidelines for dealing with different
programming tasks. Sometimes it's hard to find these as they are not all
provided as specific, dedicated documents.
The following table lists subject areas for which the
Unicode Consortium provides specifications, with a location and brief description
what each specification covers.
General
|
Character Properties:
common properties such as Name, Alphabetic, Letter, White-Space, General Category, Default-Ignorable, plus those used in other specifications |
Ch 4 |
Character Properties for CJK Ideographs: property information specific to CJK ideographs and character properties |
UAX 38 |
Unicode Character Database: general documentation about the UCD |
UAX 44 |
UCD in XML: description of the XML representation of the UCD |
UAX 42 |
Case Operations: conversion/detection of Upper/Lower/Titlecase, case
folding, case matching. See also
4.2 Case. |
§ 3.13 |
Characters with Unusual Properties: characters that implementers need to pay special attention
to |
§ 4.12 |
Use of Characters in Markup Contexts: guidelines for XML
and other markup languages |
UTR 20 |
Script Names:
usage model for determining text runs
in a given script |
UAX 24 |
Use of Characters in Mathematical Contexts:
guidelines for
mathematical usage |
UTR 25 |
Unicode Named
Character Sequences:
specifies the syntax for named
character sequences |
UAX 34 |
Encodings
|
Unicode Encoding Forms: UTF-8, UTF-16, UTF-32 conversion and
validation |
§ 3.9 |
Unicode Encoding Schemes: UTF-8, UTF-16 (BE/LE), UTF-32 (BE/LE)
conversion and validation |
§ 3.10 |
Binary Order: UTF-8 order vs. UTF-16 order |
§ 5.17 |
Character Mapping Markup Language: mapping Unicode to and from legacy code pages |
UTS 22 |
A Standard Compression Scheme for Unicode: how to compress Unicode to about the same size as legacy |
UTS 6 |
UTF-EBCDIC: encapsulating Unicode on EBCDIC systems |
UTR 16 |
Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8): a compatibility 8-bit encoding scheme |
UTR 26 |
Ideographic
Variation Database: repository of variation sequences for specified collections of Han glyphs |
UTS 37 |
Comparison (Normalization,
Collation)
|
Canonical Equivalence: when character
sequences are equivalent; canonical
ordering |
§ 3.11 |
Unicode Normalization Forms: how to normalize text for comparison |
UAX 15, § 3.11 |
Unicode Collation Algorithm:
the default mechanism for comparing, searching, and matching
Unicode text |
UTS 10 |
Parsing
|
Hangul Syllables: boundaries, parsing, (de/)composition, names |
§ 3.12 |
Decimal Numbers: conversion and validation |
§ 5.5 |
Unicode Regular Expression Guidelines: the features required in supporting regular expressions with Unicode |
UTS 18 |
Identifier and Pattern Syntax:
how to parse identifiers |
UAX 31 |
Language Information in Plain Text, also
16.9 Deprecrated Tag Characters |
§ 5.10 |
Variation Selectors: usage, validation |
§ 16.4 |
Ideographic Description Sequences: use, validation |
§ 12.2 |
Segmentation
|
Newline Guidelines: how to handle newline characters |
§ 5.8 |
Line Breaking Algorithm: the default way to determine where to linewrap |
UAX 14 |
Text Segmentation: the default way to break text into user characters, words, and sentences |
UAX 29 |
Rendering
|
The Bidirectional Algorithm: required for display of Arabic and Hebrew text |
UAX 9 |
East Asian Width: the default determination of character width
in East Asian contexts |
UAX 11 |
Minimal shaping requirements for
Arabic,
Devanagari,
Tamil, etc. |
Ch 8-10 |
Locale Data
|
Locale Data
Mark-up Language (LDML): used for Interchange of locale
data used for internationalization |
UTS 35 |
Common Locale Data Repository (CLDR): a repository of
LDML data for hundreds of locales |
CLDR |
Identifiers and Security
|
Identifier and Pattern Syntax:
security issues for identifiers |
UAX 31 |
Unicode
Security Considerations: guidelines for recognizing
Unicode security problems and dealing with them |
UTR 36 |
Unicode
Security Mechanisms: useful tools for detecting spoofs |
UTS 39 |
Unicode IDNA Compatibility Processing: mapping for IDNA2008, and compatibility processing for IDNA2003 |
UTS 46 |