XIRCH: Integrating Language-Specific Search Sites by Unicode-based Cross-language Search Protocol

Genichiro Kikui & Yoshihiko Hayashi - NTT Cyberspace Labs

Intended Audience: Software Engineer
Session Level: Intermediate

The purpose of this presentation is to introduce our cross-language meta-searching architecture and an implemented system, focusing on the role of Unicode.

Our architecture consists of search engines and meta-searchers located on the Internet. The meta-searcher can translate user's queries into the language acceptable to each search engine. A search engine may have cross-language (or translation) functions for retrieving documents in the different language from the query (as well as in the query language). In this architecture, therefore, the given query is translated by meta-searcher and/or search engines to fill the language gap between the query and the target documents stored in search engines.

To make this architecture scalable, it employs a special protocol, called XIRCH, for communication between meta-searcher and search engines. The XIRCH protocol, a descendant of STARTS (Stanford Protocol Proposal for Internet Retrieval and Search), supports not only multi-lingual queries and search results in UTF-8 but also messages that control cross-language functions.

The paper also introduces our experimental cross-language search system among 4(5) languages, namely simplified/Traditional Chinese, Japanese, Korean, and English. It best illustrates how Unicode-based architecture simplifies cross-language searching over different languages (with their own local character-sets).

