Re: Embedded language ID pr

From: Keld J|rn Simonsen (
Date: Mon Sep 11 1995 - 20:20:02 EDT

unicode@Unicode.ORG writes:

> Currently, our use of language identifiers is somewhat limited. They
> are used for things like determining codeset and font (no widespread
> deployment of Unicode yet), when to switch segmentation algorithms,
> and flags to invoke other, language-specific tools
> (e.g. spell-checking, sorting, morphological analysis).
> Another concern is that the adoption of a language id approach in a
> codeset standard might act as a bad precedent. It could open doors
> for other features that don't really belong in a codeset standard.

As I see it, the language identifier is actually part of a greater
range of information that you need for a text, such as how are
numbers represented, date formats etc. This is also known as the
locale in C and POSIX terms. There is a general need to know
which locale any text should be understood by. This information
can be given out-of-band or in-stream. What I would propose is
a standardized way to invoke a locale in-stream to solve the

As also noted above there is a need for this capacity also outside
UNICODE/10646 and thus I think that UNICODE/10646 encoding is not
the right way to standardize it in.

Keld Simonsen

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT