UnicodeIUC14
Abstract

Any tool that tries to negotiate the entire Internet is a perfect application for Unicode, and probably no other application traverses the globe like a search engine. Our presentation is a case study of the internationalization of the Lycos search engine and its localization into Japanese using Unicode. Although creating a Japanese service was the main motivation for this effort, Lycos needed a plan that addressed not only its immediate needs, but also the future prospect of a completely global search engine. Unicode was the answer.

We will discuss the relative merits of UTF8 over UCS2 in the context of Lycos' needs and why UTF8 was chosen in the end. We will also cover how we overcame other Japanese-specific challenges that were encountered, including Japanese word breaking and encoding auto-detection. Several tools were used to significantly cut development time. Chief among them are Basis Technology's Rosette, C++ Library for Unicode, which performed encoding conversions and Japanese encoding auto-detection, and Basis Technology's Japanese Text Analyzer used for Japanese word breaking.

Although this is a specific case of localization, the work and considerations involved are representative of the internationalization/localization process necessary for any software publisher looking to bring their product to overseas markets.

Unicode
When the world wants to talk, it speaks Unicode
ProgramShowcasePast ConferencesRegistrationUnicode StandardCall for Papers
AccommodationSponsorsTalks and PapersTravelConference BoardNext Conference
UnicodeIUC14
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

24 January 1999, Webmaster