The Multilingual Lion: TeX learns to Speak Unicode
Jonathan Kew - SIL International

Intended Audience: Content Developers, Software Engineers, Graphic Designers, Managers, Systems Analysts, Technical Writers

Session Level: Intermediate

Professor Donald Knuth's TeX is a typesetting system with a wide user community, and a range of supporting packages and enhancements available for many types of publishing work. However, it dates back to the 1980s and is tightly wedded to 8-bit character data and custom-encoded fonts, making it difficult to configure TeX for many complex-script languages.

This paper will focus on XeTeX, a system that extends TeX with direct support for modern OpenType and AAT fonts and the Unicode character set. This makes it possible to typeset almost any script and language with the same power and flexibility as TeX has traditionally offered in the 8-bit, simple-script world of European languages. XeTeX (currently available on Mac OS X, but possibly on other platforms in the future) integrates the TeX formatting engine with technologies from both the host operating system (Apple Type Services, Text Encoding Converter) and auxiliary libraries (ICU, TECkit). Thus, it illustrates how such components can be leveraged to provide the benefits of Unicode within an existing software system.

This paper should be of interest to those involved in multilingual and multiscript publishing, as well as developers seeking to enhance legacy systems to take advantage of the benefits of Unicode. The merger of legacy and Unicode-based technologies means that the benefits of many years of development in the TeX world become available for document production in a much wider range of languages.

Some background familiarity with TeX may be helpful, but the paper's focus will be on the integration of Unicode technologies, not on technical details of TeX itself. A general awareness of encodings, complex scripts, and font technologies will be assumed.

CLOSE WINDOW