Twenty-first International Unicode Conference

Toward a Model for Language Identification

Peter Constable - SIL International

Intended Audience:	Software Engineers, People Interested in Development of Industry Standards
Session Level:	Intermediate, Advanced

As interest in internationalisation and localisation grows, language and locale identification is needed for increasingly diverse situations. There is a growing consensus that existing standards for language identification are not fully meeting user needs and that more fully developed standards are needed. It is not always obvious how systems of identifiers should be extended to accommodate new usage scenarios, however.

This paper presents that view that part of the reason for this difficulty is a lack of an adequate ontological model for language and language-related categories. In existing practice, the two notions "language" and "locale" are assumed to cover the full range of distinctions to be made, whereas in fact there are other distinct language-related notions that may be at work and that may need to be reflected in systems of identification.

This paper proposes a set of notions for language-related categories that are relevant for information technologies (IT), and examines the relationships between them. It explores some usage scenarios and considers what ways of formulating identifiers might be appropriate for the various scenarios.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

4 March 2002, Webmaster