Re: the Ethnologue

From: Michael Everson (
Date: Wed Sep 13 2000 - 07:37:02 EDT

Ar 23:56 +0100 2000-09-12, scrobh Christopher J. Fynn:

>A lot of what are listed as "languages" in the Ethnologue are what most people
>would call dialects. For instance almost every known dialect of spoken Tibetan
>is listed as a separate language in the Ethnolouge although they all share
>only one written form.

YES. This is one of the serious problems of the list.

Ar 22:39 -0800 2000-09-12, scrobh Jrg Knappen:

>I once looked at the ethnologue and its subdivision of the german language
>is just ridiculous. Not small errors, a gross misconception. I don't trust
>the ethnologue in area where I don't know the fact well, since it fails in one
>area where I know them.

YES. This is one of the serious problems of the list.

If SIL has 2000 real languages they need codes for in real applications,
then those 2000 (which is a lot) should be proposed to 639 or 1766. That's
what 639 and 1766 are for. It would be nice to know what the applications

I do not think we should adopt all 6000 codes from the Ethnologue as
"language tags". I am, frankly, shocked that linguists should consider
doing so so uncritically.

Or what, Ken? abandon the international standards and freeze the
Ethnologue, warts and all, and just vacuum up all its entities and tell the
world, use these tags? Or do a proper job of review of real requirements?

The Ethnologue itself wasn't designed for the IT purposes everyone seems to
be clamouring for, either, as far as I know. And if it were accepted as-is,
then it couldn't be revised, right?

More haste, less speed, people. Do you need a code for German? Yes. Do you
need a code for Manx? Yes. Though the communities differ vastly in size,
their IT reqirements are quite similar.

Do you need a code for !X? May I ask what for? There are (according to
the Ethnologue) 3000-4000 speakers. According to Anthony Traill's _A !X
Dictionary_, "!X is an unwritten language and its speakers have no notion
of linguistic standardization". Well, honestly, whose IT requirements are
you going to serve? (All the characters used in the dictionary are in the
UCS. I checked, because that *is* a real and important requirement.)

Do you really need 8 codes for "the German languages"? How many "Tibetans"
are there? Is Samvedi a language? It "shares many features with Gujarati.
Survey needed". How many times does "survey needed" appear in the
Etnologue? How many of them aren't really languages (in the sense that we
need to implement for IT and libraries (which use IT by the way)) but only
preliminary studies? HOW DO WE KNOW?

The Ethnologue says there are 6000 speakers of Shelta in Ireland, 50,000 in
the US, and 30,000 in the UK. That's 86,000 speakers?! The Ethnologue says
that Shelta is Indo-European:Celtic:Insular:Goidelic, which it isn't. It
names Hancock 1990 as the source of this (impossibly incorrect)
information. In the bibliography there is no Hancock 1990.

This is just another error, and for a language in Western Europe. I do not
believe that the Ethnologue can be taken so uncritically.

There may be problems with 639 and 1766 but the committees in question have
been addressing these recently so that we can make and maintain more
effective and responsive standards. Has that all been wasted effort? IT
industry can circumvent the standards easily if it wants to. Is that a good

The Ethnologue is an important resource. I use it, along with other
resources, in my work. But I don't think it is mature enough to BE an
international standard. A namespace in RFC 1766 could be created easily:
define a tag "e-" for "Ethnologue" and allow it next to "i-" and "x-". But
I have grave concerns about the wisdom of doing so, and nothing Peter has
said has dispelled them.

Lest anyone think that *I* am coming to shrink from supporting minority
languages, I will say this:

It appears that six characters needed to support Chipeywan in Canadian
Syllabics are missing from the UCS. I'll be looking further into this and
taking appropriate action to get the missing characters encoded in the
Universal Character Set.

Michael Everson
Language Tag Reviewer, RFC 1766

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT