Keynote Presentation: Unicode Myths

Mark Davis - IBM Centre for Java Technology SV

Intended Audience: Manager, Software Engineer, Systems Analyst, Marketer
Session Level: Beginner, Intermediate, Advanced

Much of what people know about Unicode is, in fact, not actually true. This paper discusses some of the most common mistakes people make about Unicode, including:

- All characters in Unicode are in sorted order (or should be)
- Language information is required for correct use of Unicode
- Unicode is missing characters for (Lithuanian/Yoruba/Czech/...)
- Combining marks are not necessary: normalized text (NFC) does not contain them
- Unicode should have a "decimal point" character as well as a period.
- Case mappings are 1-1
- All compatibility characters are (good/bad: pick one)
- Every 16-bit Unicode value represents a character
- (UTF-8/UTF-16/UTF-32: pick one) is better than (UTF-8/UTF-16/UTF-32: pick one)
- You can use any unassigned codepoint for internal use.

