Unicode Demystified Part I:
Character Encoding Basics
Tex Texin - Yahoo! Inc.
Richard Gillam - Language Analysis Systems, Inc.

Intended Audience: All

Session Level: Beginner

This short, completely new, Unicode Tutorial introduces the core concepts of the Unicode Character Standard. The software industry has been challenged to represent and process text, from any or all languages, using just one set of algorithms. Using different algorithms for different languages is costly, a maintenance nightmare, and makes worldwide integration of applications impossible. This is the difficult problem that the Unicode standard solves.

Attendees of this tutorial are provided with a brief survey of writing systems used around the world and their distinguishing characteristics are identified. The way that text is represented and processed in software is explained and the principles employed by the designers of the Unicode standard are presented in an easy to understand fashion.

An overview of some of the algorithms defined by the Unicode standard is provided. Although the Unicode standard is designed for plain text, it can be used with rich text. Rich text considerations are also described.

The tutorial offers many graphical examples demonstrating key points, and is designed for audiences of all backgrounds. This tutorial is created by Richard Gillam and Tex Texin, leaders in software and web internationalization, and popular speakers at development conferences. Rich Gillam of Language Analysis Systems, Inc., will be presenting at IUC27.

After attending this lecture, attendees will be able to answer the following questions:

  • How do computers process text with appropriate spelling, grammar, justification and layout rules for different languages?
  • What is a character encoding?
  • What is a Unicode character?
  • How are Unicode characters represented?
  • How are Unicode characters interchanged?
  • What are Unicode character properties, and why are they important?
  • What are some of the algorithms provided to standardize multilingual text processing?
  • What tradeoffs should be considered when using Unicode in rich text, such as markup languages (HTML, XML)?
  • What are the benefits of using the Unicode Character Standard?