Unicode Frequently Asked Questions

Frequently Asked Questions

The Unicode Frequently Asked Questions (FAQ) are organized into different topic pages. A list of topic areas with links is shown below, along with brief explanations of what kinds of questions are answered in each topic area.

Many FAQ pages contain links to other pages where you will find further information about specific topics. Check in particular the Basic Questions and Specifications pages. As another option, you may find it easier to go to the search page and type in your topic plus "FAQ" to locate appropriate FAQ entries. For example "NFC FAQ", "BOM FAQ", "Tamil FAQ", and so forth. If you have a question not addressed by the FAQ entries, you may join the public Unicode email list and post your question there.

The FAQs are contributed by many people. For more information about sources, see Attribution. If you would like to help and have prepared an FAQ entry with a useful answer that you would like to contribute, please send the question and the answer to us and the Unicode editorial committee will consider posting it.

Basic Questions
Discusses the features of Unicode, how it differs from other encodings, and answers basic support questions such as where to find additional information on this site.
Arabic Script
Issues related to the Arabic script and languages using the Arabic script.
Bengali (Bangla) / Assamese Script
Issues related to the Bengali (Bangla) script and Assamese.
Blocks and Ranges
Definitions and usage of Unicode blocks and ranges, and questions about blocks versus script values for characters.
Character Properties, Case Mappings and Names
Answers questions about case conversions and case mappings; also about character names.
Characters and Combining Marks
Discusses a variety of details about text elements, combining characters, compatibility mappings, and canonical equivalence.
Chinese and Japanese
Questions specific to Han ideographs, Chinese and Japanese language handling, and East Asian fonts.
CLDR and Locales
Answers questions about Unicode Locales, CLDR, and LDML.
Collation
Answers to questions of sorting and ordering, Unicode and Java.
Compression
The Unicode compression algorithm (SCSU), LZW, Huffman encoding, and others.
Conversions / Mappings
Conversion and mapping to/from other character sets.
Coping with Change
Adapting to changes in the Unicode Standard.
Display of Unsupported Characters
Discusses what to do when attempting to display unsupported Unicode characters.
Emoji and Pictographs
Discusses sets of pictorial symbols including Emoji, Dingbats, Webdings and Wingdings, how and why they have been encoded and how to display or implement them.
Emoji Submission
Discusses the submission process for proposals for new emoji.
Entities and Named Sequences
Discusses named entities and Unicode named character sequences.
FAQ on FAQs
Describes how and when new FAQs are created, how FAQs relate to specifications, and and what to do if you think there is an error in a Unicode specification.
Fonts & Keyboards
Where to find more information about fonts. Displaying characters in Java. Glyph variations. Inputting Chinese and other characters.
Greek
Questions specific to the Greek language, script, and fonts.
Guide to Abbreviations in Standards
Lists abbreviations and acronyms used by other standards developing organizations.
Indic Scripts and Languages (except Tamil or Bengali)
Questions specific to Indic scripts, languages, fonts, and keyboards.
Internationalization
Explains the role of Unicode in internationalization of software and answers questions about upgrading software to support Unicode.
Internationalized Domain Names (IDN)
Provides a series of background explanations about International Domain names and the different specifications for them.
Korean
Questions about Hangul and Jamo characters for Korean, and Korean normalization issues.
Language Tagging
Plane 14 language tags and language tagging in general.
Latin and Cyrillic
Questions about the Latin and Cyrillic scripts.
Ligatures, Digraphs, Presentation Forms vs. Plain Text
Can't find a certain digraph or ligature your language needs? Can you use a particular presentation form?
Line Breaking
Questions about how to break text into separate lines for display.
Myanmar
Issues related to the Myanmar script and fonts, and to languages which use the script.
Normalization
Questions regarding the various normalization forms, their use, and where to go for further information.
Private-Use Characters, Noncharacters, and Sentinels
Questions about private-use characters and how they are distinguished from noncharacters and sentinels.
Programming Issues
Questions regarding conversion of string handling in old programs, as well as other issues regarding support of Unicode strings in programs.
Proposing New Characters
What are the latest proposals? What about my script? When will the next version of the Unicode Standard be available?
Punctuation and Symbols
Discusses issues related to punctuation and symbols, including the differences between them.
Security Issues
Does Unicode pose security problems? What can be done about such problems as character spoofing?
Specifications
Information on where to find specifications or guidelines for dealing with different programming tasks in the Unicode Standard and related standards.
Standards Developing Organizations
Describes what SDOs are and how the Unicode Consortium works with them. Answers questions about ISO, IETF, W3C, and the terminology they use.
Submitting Successful Character and Script Proposals
Guidelines on how to write a successful proposal to add new characters or a new script, or to fix a problem in the standard.
Tamil Script and Language
Issues related to the Tamil language and script
Technical Reports Development Process
Discusses the development and maintenance process for technical reports, including how they are created and archived.
Unicode and ISO 10646
Relationships between Unicode and ISO working groups, ISO standards. How Unicode differs from 10646.
Unicode and the Web
Unicode in other standards (W3C, IETF, ...). How to deal with numeric character references. Unicode in HTML.
Unicode Character Database
Questions about the Unicode Character Database (UCD).
UTF-8, UTF-16, UTF-32 & BOM
Questions about encoding forms (UTF-8, UTF-16, and UTF-32) and use of the byte order mark.
Variation Sequences
Answers questions about the meaning, use, and display of variation sequences and selectors.
Writing Direction & BIDI Ordering
Questions about writing direction, particularly “bidi” bidirectional left-right and right-left text.