Date: Wed Sep 13 2000 - 15:26:14 EDT

On 09/13/2000 02:17:52 AM John Hudson wrote:

>The first
>tasks should be to a) identify the different kinds of information that
>to be represented by tags (spoken languages, written languages, literary
>languages (not the same thing as a written languages), particular
>orthographies, language-specific script variants, ?, ?) and then b)
>identify appropriate existing standards (if any actually exist) or develop
>new standards to contain these tags...

I have no problem with that, except that it would need to be done in the
right way. The point is to understand the needs of specific forms of
information processing, and to evaluate for each exactly what kinds of
distinctions are needed. In some cases, it will be language per se; for
others, it will be writing system (usually language-specific, but in some
exceptional cases may cross multiple languages), etc. The only problem is
that I suspect we're several years away from understanding all of this. In
the mean time there are people who need language identifiers for their
data. It's in the cases of the more familiar languages (many of them
European), that we may need special cases to deal with distinct notions
such as written vs. spoken vs. literary languages. But for someone dealing
with something like Ancash Quechua, this is all a big herring that is
getting in the way of providing them with the language identifier that they
need. And that is true for the majority of the 6000+ languages that don't
yet have any identifier.

We need to work toward perfection, but if we insist on perfection before we
take a first step, we'll likely never make progress; and in the mean time,
lots of users continue to go without the identifiers that they need -
identifiers that often are in no way affected by the issues for which we're
trying to find the perfect solution.

>Without such an approach, any new standard work will be plagued with
>exactly the kind of inconsistencies that make both ISO 639 and the
>Ethnologue of dubious merit for IT purposes.

I don't understand assertions that the Ethnologue is of dubious merit for
IT purposes that are often made by people without much experience working
with thousands of minority languages when the Ethnologue was created by
people who have been working with thousands of minority languages
specifically for their own IT purposes. SIL has considerable experience
using Ethnologue codes as language identifiers, and while we will
acknowledge that it isn't perfect, it has served our IT purposes very well
- FAR better than ISO 639-x currently can. It is fallacious to look at IT
issues in the context of major languages (which are already covered by
other standards, and which have some special complications due to long
histories of literary tradition and sociolinguistic change and
diversification) and extend those conclusions to the context of minority
languages. And this is about the latter. It's not about replacing en-US
with Ethnologue's "eng", since that will never happen (and it is not what
we would propose). This is about having identifiers for languages like
Cuaiquer (KWI) and hundreds of others in South America rather than having
to use sai "South American Indian (other)" for all of them; or something
for Lahu Shi (KDS) and hundreds of other languages of SE Asia and China
rather than having to use sit "Sino-Tibetan (other)" for all of them; etc.

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <>

