UnicodeIUC17

Abstract

XIRCH: Integrating Language-Specific Search Sites by Unicode-based Cross-language Search Protocol

Genichiro Kikui & Yoshihiko Hayashi - NTT Cyberspace Labs

Intended Audience: Software Engineer
Session Level: Intermediate

The purpose of this presentation is to introduce our cross-language meta-searching architecture and an implemented system, focusing on the role of Unicode.

Our architecture consists of search engines and meta-searchers located on the Internet. The meta-searcher can translate user's queries into the language acceptable to each search engine. A search engine may have cross-language (or translation) functions for retrieving documents in the different language from the query (as well as in the query language). In this architecture, therefore, the given query is translated by meta-searcher and/or search engines to fill the language gap between the query and the target documents stored in search engines.

To make this architecture scalable, it employs a special protocol, called XIRCH, for communication between meta-searcher and search engines. The XIRCH protocol, a descendant of STARTS (Stanford Protocol Proposal for Internet Retrieval and Search), supports not only multi-lingual queries and search results in UTF-8 but also messages that control cross-language functions.

The paper also introduces our experimental cross-language search system among 4(5) languages, namely simplified/Traditional Chinese, Japanese, Korean, and English. It best illustrates how Unicode-based architecture simplifies cross-language searching over different languages (with their own local character-sets).

Unicode
When the world wants to talk, it speaks Unicode
Unicode Standard Program Conference Board Call for Papers Talks and Papers Past Conferences
Showcase Registration Accommodation Travel Sponsors Next Conference
UnicodeIUC17
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

18 Jun 2000, Webmaster