Keynote Presentation: Unicode Myths

Mark Davis - IBM Centre for Java Technology SV

Intended Audience: Manager, Software Engineer, Systems Analyst, Marketer
Session Level: Beginner, Intermediate, Advanced

Much of what people know about Unicode is, in fact, not actually true. This paper discusses some of the most common mistakes people make about Unicode, including:

- All characters in Unicode are in sorted order (or should be)
- Language information is required for correct use of Unicode
- Unicode is missing characters for (Lithuanian/Yoruba/Czech/...)
- Combining marks are not necessary: normalized text (NFC) does not contain them
- Unicode should have a "decimal point" character as well as a period.
- Case mappings are 1-1
- All compatibility characters are (good/bad: pick one)
- Every 16-bit Unicode value represents a character
- (UTF-8/UTF-16/UTF-32: pick one) is better than (UTF-8/UTF-16/UTF-32: pick one)
- You can use any unassigned codepoint for internal use.

When the world wants to talk, it speaks Unicode
Unicode Standard Program Conference Board Call for Papers Talks and Papers Past Conferences
Showcase Registration Accommodation Travel Sponsors Next Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

21 Jun 2000, Webmaster