Re: Language identifier proposals request

From: Olle Jarnefors (
Date: Mon Sep 04 1995 - 11:40:11 EDT

Keld Simonsen <> wrote:

> > We aren't looking for a language identifier approach, we are checking
> > for previous proposals that might overlap or encompass one we might
> > submit.
> Here is the ISO 639 approach:
> Technical contents of ISO 639:1988 (E/F)
> "Code for the representation of names of languages".
> Typed by 1990-11-30
> Minor corrections, 1992-09-08 by Keld Simonsen
> Sundanese corrected, 1992-11-11 by Keld Simonsen
> Telugu corrected, 1995-08-24 by Keld Simonsen
> Two-letter lower-case symbols are used.
> The Registration Authority for ISO 639 is Infoterm, Osterreiches
> Normungsinstitut (ON), Postfach 130, A-1021 Vienna, Austria.
> aa Afar
> ab Abkhazian
> ...

Keld's list reflects the content of the 1988 standard. After
that three of the codes were replaced by others and codes
for three additional langauges were added, in 1989.

This is described in RFC 1766, "Tags for the Identification of
Languages", March 1995:

: The following codes have been added in 1989 (nothing later): ug
: (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang), he (Hebrew,
: replacing iw), yi (Yiddish, replacing ji), and id (Indonesian,
: replacing in).

I checked this with Infoterm in May 1994. It is possible that
further registrations have been made since then.

I would certainly recommend everyone interested in langauge
tagging to read this RFC. It can be found in these places, among
others: (compressed format)

Another proposal for language coding (by means of escape
sequences in accordance with ISO 6429) has been put forward in
ISO JTC1/SC2 by the Swedish member body. It can be found at:

This proposal is under review by the "escape sequence gurus" of
SC2, Willy Bohn and Joachim Friemelt, who hadn't yet reached any
conclusion according to a report at the SC2/WG3 meeting in
Helsinki in June this year.

Both the Internet langauge tagging system for messages and
message parts described in RFC 1766 and the Swedish escape
sequence proposal are based on the two-letter langauge codes of
ISO 639. They can both be extended to three-letter codes, if and
when ISO/TC46 (library standardization) and ISO/TC37
(terminology standardization) are able to reach an agreement on
the details of the proposed second part of ISO 639. The ISO CD
639-2 voted down in 1993 contained three-letter codes for about
400 languages (as compared to the almost 130 languages of ISO


Olle Jarnefors, Royal Institute of Technology, Stockholm <>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT