Re: the Ethnologue

From: Michael Everson (everson@egt.ie)
Date: Sat Sep 16 2000 - 09:21:04 EDT


Ar 12:04 -0800 2000-09-13, scríobh Peter_Constable@sil.org:
>In
>the mean time there are people who need language identifiers for their
>data. It's in the cases of the more familiar languages (many of them
>European), that we may need special cases to deal with distinct notions
>such as written vs. spoken vs. literary languages. But for someone dealing
>with something like Ancash Quechua, this is all a big herring that is
>getting in the way of providing them with the language identifier that they
>need. And that is true for the majority of the 6000+ languages that don't
>yet have any identifier.

The Ethnologue lists six different Ancash Quechua, five different Huánaco
Quechuas, and a lot of other Quechuas besides. It's got five kinds of
Italian. How do we evaluate this? And I don't know how many Zapotecos,
there are too many to count. Do we just accept that it's all been evaluated?

Well, then we find errors, and we point them out. And we say, that's why
we're worried about this database. But Peter says that's not good enough,
it's only "anecdotal", and indeed the burden is placed on us to improve the
Ethnologue by filing reports.

I've got Meillet and Cohen's 1924 _Les langues du monde_ here on my desk in
front of me. Like the Ethnologue, it deals with the languages of the world.
It has big lists in it. Would I accept those uncritically either? No.

>This is about having identifiers for languages like
>Cuaiquer (KWI) and hundreds of others in South America rather than having
>to use sai "South American Indian (other)" for all of them; or something
>for Lahu Shi (KDS) and hundreds of other languages of SE Asia and China
>rather than having to use sit "Sino-Tibetan (other)" for all of them; etc.

I agree, these "(other)" categories are unsatisfactory, and I think if I
had been involved with the early drafting of 639-2 I would have complained
rather loudly about it. Sure the Dewey or the LC _cataloguing_ identifier
systems need such groupings (as they do "Romance languages" and "Slavic
languages" but language identification of bibliographical item is a
different thing.

>Perhaps there is a perception that ISO is unresponsive
>leading people not to make their requests. Perhaps the Maintenance Agency
>*is*, in fact, unresponsive.

The MA has revised its working procedures in February and they seem to work
OK. There are voting procedures and consideration procedures.

>That's how you've been coming across in these
>discussions:

I don't represent the 639 Maintenance Agency, though I am the RFC 1766
language tag reviewer.

>rather than saying, "I recognise the need, but have some
>concerns about some details, so lets investigate how we can find the best
>all-around solutions," your response has been, "I am not interested in
>considering the list of languages enumerated in the Ethnologue."

I recognize the need for more languages. My concern with the Ethnologue is
with its classification. I didn't say that I wasn't interesting in
considering what is in the Ethnologue. I said that adopting it uncritically
could be a mistake, especially as it is a work in progress and if we were
suddenly to adopt 6000 tags then we'd be stuck with them forever. You know
how much a fuss there was just because the code for Yiddish was changed
from ji to yi? Well how much fuss is there going to be if we find out that
Upper Kinauri and Lower Kinauri shouldn't really have been given two
different codes? Because we DON'T want to change codes once they have been
used in an RFC 1766 context.

Therefore I am wary of such a huge list. Do you really find this so
unreasonable? I'm not the only one who has expressed this concern.

Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169
27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT