Many applications that process database information are moving toward Unicode as their internal encoding, while allowing for multiple legacy encodings on input and output. This trend brings up several issues relating to Unicode: codeset conversions, data storage, sorting, and round-trip conversion guarantees. The Unicode enablement of Trillium, a "database cleansing" system developed by Trillium Software Systems (a division of Harte-Hanks Data Technologies), provides an excellent case study. We discuss the issues that arose in this project and how they were solved, including the following:

* Data Input/Output
* Simplifying/streamlining encoding conversions: using a library of conversion routines
* Creating mapping tables for encodings when not provided by the Unicode Consortium
* "Round-trip" conversion problems and inconsistencies Storage
* Need for increased storage space when using Unicode
* Field length changes when converting between different code sets Sorting
* Unicode "Compatibility Characters"
* Internal sorting for Japanese: converting from UCS2 to a legacy encoding that is a common basis for sorting

