Many applications that process database information are moving toward Unicode as their internal encoding, while allowing for multiple legacy encodings on input and output. This trend brings up several issues relating to Unicode: codeset conversions, data storage, sorting, and round-trip conversion guarantees. The Unicode enablement of Trillium, a "database cleansing" system developed by Trillium Software Systems (a division of Harte-Hanks Data Technologies), provides an excellent case study. We discuss the issues that arose in this project and how they were solved, including the following:

* Data Input/Output
* Simplifying/streamlining encoding conversions: using a library of conversion routines
* Creating mapping tables for encodings when not provided by the Unicode Consortium
* "Round-trip" conversion problems and inconsistencies Storage
* Need for increased storage space when using Unicode
* Field length changes when converting between different code sets Sorting
* Unicode "Compatibility Characters"
* Internal sorting for Japanese: converting from UCS2 to a legacy encoding that is a common basis for sorting

When the world wants to talk, it speaks Unicode
ProgramShowcasePast ConferencesRegistrationUnicode StandardCall for Papers
AccommodationSponsorsTalks and PapersTravelConference BoardNext Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

24 January 1999, Webmaster