UnicodeIUC23
ProgramShowcaseRegistrationAccommodationTravelSponsors
Unicode StandardConference BoardConference CDLast ConferencePast ConferencesNext Conference
Abstract

Beyond UTR22: Complex Legacy-to-Unicode Mappings

Jonathan Kew - SIL International

Intended Audience: Software Engineers, Systems Analysts
Session Level: Intermediate

Purpose:

To investigate needs for complex mapping between non-standard legacy encodings and Unicode, and to explore a processing model appropriate for such mappings.

While Unicode was designed to facilitate easy mapping of data in most industry-standard legacy encodings, there are many "custom fonts" in use around the world which effectively represent additional, non-standard encodings. In some cases, these may encode many presentation forms, such as variants of overstriking accents, or characters encoded in an order that does not match Unicode.

The standard format for mapping descriptions presented in UTR22 is not adequate to support such encodings, especially when round-trip conversion is required. Likewise, tools based on this standard are not powerful and flexible enough.

This paper, an updated version of one presented at IUC22, will illustrate the issues by considering the types of complexity seen in a variety of custom legacy encodings. It then describes a processing model and description language we have developed to address such data conversion needs, and shows how this can be applied to help users migrate from legacy systems with custom fonts to standard, Unicode-based systems.

In conclusion, I will suggest how the UTR22 mapping description format might be extended to support complex mapping processes, and briefly demonstrate some software tools based on this model for complex mappings.

Unicode
When the world wants to talk, it speaks Unicode

UnicodeIUC23
ProgramShowcaseRegistrationAccommodationTravelSponsors
Unicode StandardConference BoardConference CDLast ConferencePast ConferencesNext Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

12 December 2002, Webmaster