Designing a Farsi/English Unicode-based Search Engine
Mohammad Azadnia, Maziar Salehi & Ali Mohammad Zareh Bidoki - Iran Telecommunication Research Center (ITRC)
In this paper we have tried to design a prototype of Farsi/English search engine. It has the duty of covering the web features such as heterogeneity, volatility and huge amount of unstructured information. These features as well as the rapid advance in technology, challenge the classical Information Retrieval (IR) techniques.
Although a growing number of Farsi-supported sites exist, still few research works have been done regarding the encoding and indexing of Farsi texts. It seems that Unicode is sufficiently capable of preparing a conclusive environment within this respect specially regarding to indexing web pages, however Many common Farsi code-pages have to be converted into Unicode, in order to cover most of the existing sites.
We utilized Unified Modeling Language (UML) to generate a visual easy-to-scale model, and to assure scalability and reliability, we used Clustering techniques and RAID. We've tried to apply Common Object Request Broker Architecture (CORBA) due to distributed object-oriented design and our agent-oriented trends in the system.
Keywords: Unicode, UML, CORBA, Information Retrieval, Search engine, Farsi language, clustering
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
19 January 2002, Webmaster