Re: (iso639.186) the Ethnologue

Date: Tue Sep 12 2000 - 13:54:47 EDT

On 09/12/2000 12:18:37 PM Michael Everson wrote:

>I thnk there are codes given to entities in the Ethnologue list that
>languages in the sense that we need to identify languages in IT and in
>Bibliography (which is what the codes are for).

Perhaps there is a cat that needs to be let out of the bag here. ISO 639
codes were primarily intended for bibliography purposes. Gary and I point
out in our paper that the needs of that sector do not necessarily
correspond to the general needs of IT, particularly for language-specific
processing. A tag that denotes a group of languages serves no useful
purpose for most language-specific processes. For example, if all you know
about the language of some information object is that it is an Athapascan
language, you can't spell-check that information. The intro to ISO 639
claims that the standard is intending to serve the needs of a variety of
sectors, but in its current state it is failing to adequately serve some.
We're not arguing that it is of no use, but it is an open question as to
whether bibliographic codes were the best starting point for general IT
use. Regardless, we have them, and they are already in use. The important
question then is how to move forward to find something that will serve all
sectors of IT.

Furthermore, we would contend that the categories enumerated in the
Ethnologue by-and-large *are* the categories that need to be identified for
general IT purposes. In the majority of cases, the distinctions made are
those that would be needed to successfully spell-check, for example. (We
acknowledge that that is not true in all cases; for example, Chinese
spelling would cross multiple languages; and alternate English spellings
are needed for what would generally be considered one language. But these
are the exceptions, not the norm.)

>I think that it is not
>mature for International Standardization. It is a work in progress,
>to change. As such it is a living document.

Change is needed as the objects described change and as our knowledge of
the objects change. This is no less true of several ISO standards: 10646,
3166,... It is especially true of 639: for example, currently if someone
wants to tag a document containing Hopi text, they would need to use the
tag nai "North American Indian (other)". Suppose in two years time there is
a specific code for Hopi added to ISO 639-2; consider what happens to that
existing data: it is now *incorrectly* tagged (not just sub-optimally
tagged), because nai no longer includes Hopi since that now has its own
code. Every time a new code is added to ISO 639, the meaning of some
existing codes changes. That is at least as serious a concern that a person
would likely encounter with any changes to the Ethnologue, and it is
probably more serious. Please don't assume that carefulness in defining ISO
639 will avoid problems. It already has inescapable problems. We need to
understand those problems and learn to manage them, and that will be made
rather easier if we quickly expand to include a comprehensive enumeration
of modern languages. Yes, that will not solve all problems, but it will be
a beneficial move forward.

>I don't see what the hurry is. Make a list of 100 languages that you
>codes for urgently. Make a list of another 100 after that. Encode
>that you *really* need codes for. That's what I mean by saying "just
>because it's in the list doesn't mean it should get a code".

Considering only those languages in which we have been involved, SIL has an
immediate need for a couple of thousand codes. But we know that many others
have similar large-scale needs that collectively include the entire
Ethnologue list. There are *lots* of people asking for this, not just me,
not just SIL.

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT