Multilingual Metasearch Issues - Searching, Finding, and Archiving Data of the World
Michael McKenna, Advanced Technology Group, California Digital Library, University of California Office of the President, USA

Intended Audience: Software Engineers, Systems Analysts, Testers, Web Designers

Session Level: Beginner, Intermediate

Libraries and museums have been cataloging and archiving creative, scholarly, and political works since before the time of the Greeks. In the Digital Age, institutions have been storing metadata about information for the past forty years or more. Even though standards exist, and have existed for some time, each institution may have chosen to store its information in different formats or encodings, may use different subsets of metadata, or different protocols to access the information.

In order to allow federated metasearch (distributed searching) across multiple repositories physically owned and managed by different institutions, several problems must be overcome. This paper will discuss problems and solutions related to the California Digital Library (CDL) which links the libraries of several university campuses and museums, California public libraries, several hundred scholarly databases, and links to other institutions such as Stanford, MIT, and the Library of Congress. Among these problems are normalization of metadata, font rendering, protocol recognition, cross-language queries, and mixing legacy systems with web services.

In addition, we will take a look at emerging problems as the California Digital Library takes on the issues of archiving antiquities, oral histories, historical web sites, and non- textual digital media.