Target Audience: Manager, Software Engineer, Marketer
Level of Session: Beginner
Any tool that tries to negotiate the entire Internet is a perfect application for Unicode, and probably no other application traverses the globe like a search engine. Our presentation is a case study of the internationalization of the Lycos search engine and its localization into Japanese using Unicode. Although creating a Japanese service was the main motivation for this effort, Lycos needed a plan that addressed not only its immediate needs, but also the future prospect of a completely global search engine. Unicode was the answer.
We will discuss the relative merits of UTF8 over UCS2 in the context of Lycos' needs and why UTF8 was chosen in the end. We will also cover how we overcame other Japanese-specific challenges that were encountered, including Japanese word breaking and encoding auto-detection. Several tools were used to significantly cut development time. Chief among them are Basis Technology's Rosette, C++ Library for Unicode, which performed encoding conversions and Japanese encoding auto-detection, and Basis Technology's Japanese Text Analyzer used for Japanese word breaking.
Although this is a specific case of localization, the work and considerations involved are representative of the internationalization/localization process necessary for any software publisher looking to bring their product to overseas markets.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
5 July 1999, Webmaster