Re: [OT] Re: the Ethnologue

From: John Cowan (
Date: Sat Sep 16 2000 - 19:59:50 EDT

On Sat, 16 Sep 2000, Doug Ewell wrote:

> But it gets worse. When I stripped out the alternate-names field and
> again checked for duplicated codes, I found 14 (AVL AYL CAG CTO FUV GAX
> GSC GSW JUP MHI MHM MKJ SHU SRC). Some of these duplicates differ only
> in spelling (CAG 'Chulupi' vs. 'Chulupí') but other differences are a
> lot more troubling. For example, SHU is both 'Arabic, Chadian Spoken'
> and 'Arabic, Shuwa.' As a non-expert in Arabic, how do I know these
> two names describe the same dialect of Arabic? (These are certainly
> dialects, not discrete languages.)

I see the problem: the same language (with the same code) may be preferentially
known by one name in one country and another name in another. Because
the Ethnologue names languages by country, conflicts like this can appear.
The entry on "Chadian Spoken Arabic" (in Chad) lists "Shuwa Arabic" as a
synonym; the name "Shuwa Arabic" is the primary name in Niger, Nigeria,
and Cameroon.

> MKJ is the Ethnologue code for both 'Macedonian' and 'Slavic'.
> Absolutely *everyone* knows there is no one 'Slavic' language; the name
> refers to an entire language family. This is much more imprecise than
> any of the despised 'Other' codes in ISO 639.

Again, "Macedonian" is the preferred name in Macedonia, Bulgaria, and
Albania, but "Slavic" is preferred in Greece.

> SRC is the code for 'Bosnian', 'Croatian', and 'Serbo-Croatian', which
> means that there is a many-to-one mapping from ISO 639-1 'bs', 'hr',
> 'sr' to Ethnologue 'SRC'. This is likely to cause much more widespread
> trouble than the Hopi example mentioned earlier.

By Ethnologue standards of mutual intelligibility, there is only one
language here.

> Certainly more codes need to be added to ISO 639, and the Maintenance
> Agency needs to be sure not to present an image of unresponsiveness
> (if in fact they have been guilty of that in the past). However, they
> have their own, existing guidelines for the level at which languages
> should be encoded (one written vs. 60 spoken variants) and this must
> be respected.

Precisely. Unwritten languages, or languages with only a few written
works, or languages whose written form appears only on bamboo, don't
make it into 639-2, which is (like it or not) in practice a standard
for bibliographic use.

In addition, the notion of mapping spoken form A to written form B
on the basis that the speakers of A write B when they need to write
entails the notion that Dongxiang [SCE], a language of the Mongolian family,
is a "dialect" of Chinese in the same sense that Wu Chinese [WUU] is.

> And the duplicated codes in the Ethnologue list must be
> edited down to one code each, or the list will not earn the respect for
> accuracy that it perhaps deserves.

It seems clear from the detailed information that in all 14 cases,
there is only one language, known by different names in different
countries. Expecting the Ethnologue to solve this problem by fiat,
or even to openly prefer one name over another when nationalist sympathies
decree otherwise, is IMHO not reasonable.

John Cowan                         
One art/there is/no less/no more/All things/to do/with sparks/galore
	--Douglas Hofstadter

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT