RE: Questions re ISO-639-1,2,3

From: Peter Constable (petercon@microsoft.com)
Date: Mon Aug 22 2005 - 04:38:57 CDT

Next message: Alexej Kryukov: "Re: Historical Cyrillic in Unicode"

Previous message: Abhijit Dutta अभिजीत दत्ता: "FYI: 28th IUC paper - Tamil Unicode New"
Maybe in reply to: Donald Z. Osborn: "RE: Questions re ISO-639-1,2,3"
Next in thread: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Reply: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> From: Donald Z. Osborn [mailto:dzo@bisharat.net]

> 1) It is seen as convenient to have a one-stop site for various
> information
> relevant to localization. (For my part, when assembling information on
a
> language-by-language basis for African language localizers, I thought
it
> useful
> to put relevant ISO-639 codes on the various pages

I have no problem with citing ISO 639 IDs for particular languages: that
is something that we expect to be stable. It's quite another thing,
however, if we're talking about a general listing of language
identifiers. For the latter, I feel that people should refer to
definitive sources: the official source or an approved mirror.

> 2) The presentation on the sites, notably the LOC one, is utilitarian
but
> not
> very dynamic (not that I have room to criticise on this point, but
just an
> observation). To SIL's credit, their presentation offers some
different
> ways to
> present the data, and some downloads (which coincidentally might
encourage
> setting up static lists of ISO-639 codes on other pages), but there
are
> gaps
> and there is no search feature for the codes.

Could you give me an example of what you're referring to by "gaps"?

A UI for searching could certainly be considered. I suspect that the RA
for 639-3 will be open to considering additional UI features to meet
user needs.

> 3) ISO-639 data fed from the official sites could facilitate devising
a
> kind of
> relational database linking it to alternate names for languages and
> perhaps
> groupings of languages.

It would seem to me that such a relational database will either rely on
an internal data table, in which case a downloadable file is what is
wanted, or URIs pointing to record for particular languages that are
available on the Internet. If that's what you're referring to, then the
ISO 639-3 RA will provide that. But that does not imply a need for other
sites to present mirrors or duplicates of the 639 code tables.

> 3a) Say you were looking for the code for Pulaar...

I think the idea of a search facility that can use alternate names such
as those listed in Ethnologue would be a great idea. Of course, if the
list of alternate names is incomplete, the user won't find what they're
looking for, and it's unlikely a list could ever be complete.

> they would come up with an ISO-639-3 code for Pulaar but still be
ignorant
> of
> the ISO-639-1&2 codes for "Fulah/Peul" that might actually serve the
> purpose
> intended by the user.

If you're referring to the online version of Ethnologue, I suspect that
once the 639-3 is launched the language descriptions on the Ethnologue
site will provide hotlinks to information for that language on the ISO
639-3 site (and that would include association with macrolanguage
categories).

> or better yet provide accurate raw data feed

So, what do you consider a "raw data feed"? A URL of the form

http://www.sil.org/iso639%2D3/documentation.asp?id=aaa

will return data pertaining to the identifier "aaa". That particular URL
will return data in HTML format; I'm guessing by "raw data" you want
something other than HTML. You want plain text? Some kind of XML record?
Such things certain can be considered, though I can't speak for the
639-3 RA regarding their openness to doing that.

> maybe the ISO-639 lists such as they are will need some sort of
revisions
> at
> some point with respect to what languages (dialects) are represented
at
> the
> "language" and macrolanguage levels, and what the relationship among
them
> is.
> The example of Fula/Peul and its variant forms that I mentioned above
is
> an
> interesting case in point - the fundamental unity and evident
diversity of
> the
> language(s) are such that one could imagine the utility of tagging
Pulaar
> as
> ff-fuc - that is Fula-Pulaar, using ISO-639-1 (always the preference
over
> ISO-629-2 where there are both, as I understand it from the W3C site)
and
> ISO/DIS-639-3, though such nesting of ISO-639-3 I understand not to be
> intended. Further specification by country code would be helpful since
the
> orthography in Senegal varies slightly from that in neighboring Mali
and
> perhaps Mauritania.

It's not clear to me what you feel is lacking here. The 639-3 site will
tell you that the category "ff"/"ful" is a macrolanguage, and what is
the list of its encompassed individual languages, which list will
include the category "fuc". The record for "fuc" will document its
properties as defined in ISO 639-3 and will also include links to
external resources such as Ethnologue that will document its denotation
more fully. If there is more you think is required, please clarify.

> Anyway, these are clearly not easy decisions and I know that in the
> interests of
> "stability" one can't go about undoing and renaming existing codes.
But
> these
> are matters that will likely prompt (provoke?) more discussion as
various
> users, webmasters, and localizers come into contact with and attempt
to
> use the
> standard (lang tagging web content; localization) for languages
currently
> less-represented in computing and cyberspace.

!!! How did we suddenly go from providing a raw data feed to questions
of choices of IDs for particular languages?

> Thanks for any feedback. (One logical suggestion is that this go to
the
> ISO-639
> list - perhaps someone could forward it there and I guess I'll have to
> subscribe.)

We need to watch that this doesn't go too far off topic for the lists to
which this is addressed.

Peter Constable

Next message: Alexej Kryukov: "Re: Historical Cyrillic in Unicode"
Previous message: Abhijit Dutta अभिजीत दत्ता: "FYI: 28th IUC paper - Tamil Unicode New"
Maybe in reply to: Donald Z. Osborn: "RE: Questions re ISO-639-1,2,3"
Next in thread: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Reply: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 04:39:44 CDT