The Common Locale Data Repository

Mark Davis - IBM Corporation

Intended Audience:	Managers, Software Engineers, Systems Analysts, Marketers
Session Level:	Beginner, Intermediate, Advanced

In the internationalization arena, Unicode has provided a lingua franca for communicating textual data. But there remain differences in the locale data used for a variety of tasks, such as formatting dates and times according to the conventions of different languages. Many of those differences are simply gratuitous; all within acceptable limits for human beings, but resulting in different results. In many other cases there are outright errors.

Whatever the cause, the differences can cause discrepancies to creep into heterogeneous systems, common among corporations and governments. This is especially serious in the case of collation (sort-order), where different collation causes not only ordering differences, but also different results of queries. That is, with a query of customers with names between "Arnold, James" and "Abbot, Cosmo", where different systems have different sort orders, very different lists will be returned.

The Common Locale Data Repository is a project for the exchange of culturally sensitive (locale) information used in application and system development, and to gather, store, and make available data generated in that format. This paper describes the goals and features of the Common XML Locale Repository project with a summary of the latest changes in project up to this point, and gives an overview of the XML format for locale data exchange, the current status of the Repository, the comparison of existing data from different platforms, and the process of vetting data to produce a unified set of locale data.

Co-author of this paper is Steven Loomis of IBM Corporation.