From: Peter Constable (email@example.com)
Date: Mon Aug 22 2005 - 04:38:57 CDT
> From: Donald Z. Osborn [mailto:firstname.lastname@example.org]
> 1) It is seen as convenient to have a one-stop site for various
> relevant to localization. (For my part, when assembling information on
> language-by-language basis for African language localizers, I thought
> to put relevant ISO-639 codes on the various pages
I have no problem with citing ISO 639 IDs for particular languages: that
is something that we expect to be stable. It's quite another thing,
however, if we're talking about a general listing of language
identifiers. For the latter, I feel that people should refer to
definitive sources: the official source or an approved mirror.
> 2) The presentation on the sites, notably the LOC one, is utilitarian
> very dynamic (not that I have room to criticise on this point, but
> observation). To SIL's credit, their presentation offers some
> ways to
> present the data, and some downloads (which coincidentally might
> setting up static lists of ISO-639 codes on other pages), but there
> and there is no search feature for the codes.
Could you give me an example of what you're referring to by "gaps"?
A UI for searching could certainly be considered. I suspect that the RA
for 639-3 will be open to considering additional UI features to meet
> 3) ISO-639 data fed from the official sites could facilitate devising
> kind of
> relational database linking it to alternate names for languages and
> groupings of languages.
It would seem to me that such a relational database will either rely on
an internal data table, in which case a downloadable file is what is
wanted, or URIs pointing to record for particular languages that are
available on the Internet. If that's what you're referring to, then the
ISO 639-3 RA will provide that. But that does not imply a need for other
sites to present mirrors or duplicates of the 639 code tables.
> 3a) Say you were looking for the code for Pulaar...
I think the idea of a search facility that can use alternate names such
as those listed in Ethnologue would be a great idea. Of course, if the
list of alternate names is incomplete, the user won't find what they're
looking for, and it's unlikely a list could ever be complete.
> they would come up with an ISO-639-3 code for Pulaar but still be
> the ISO-639-1&2 codes for "Fulah/Peul" that might actually serve the
> intended by the user.
If you're referring to the online version of Ethnologue, I suspect that
once the 639-3 is launched the language descriptions on the Ethnologue
site will provide hotlinks to information for that language on the ISO
639-3 site (and that would include association with macrolanguage
> or better yet provide accurate raw data feed
So, what do you consider a "raw data feed"? A URL of the form
will return data pertaining to the identifier "aaa". That particular URL
will return data in HTML format; I'm guessing by "raw data" you want
something other than HTML. You want plain text? Some kind of XML record?
Such things certain can be considered, though I can't speak for the
639-3 RA regarding their openness to doing that.
> maybe the ISO-639 lists such as they are will need some sort of
> some point with respect to what languages (dialects) are represented
> "language" and macrolanguage levels, and what the relationship among
> The example of Fula/Peul and its variant forms that I mentioned above
> interesting case in point - the fundamental unity and evident
> language(s) are such that one could imagine the utility of tagging
> ff-fuc - that is Fula-Pulaar, using ISO-639-1 (always the preference
> ISO-629-2 where there are both, as I understand it from the W3C site)
> ISO/DIS-639-3, though such nesting of ISO-639-3 I understand not to be
> intended. Further specification by country code would be helpful since
> orthography in Senegal varies slightly from that in neighboring Mali
> perhaps Mauritania.
It's not clear to me what you feel is lacking here. The 639-3 site will
tell you that the category "ff"/"ful" is a macrolanguage, and what is
the list of its encompassed individual languages, which list will
include the category "fuc". The record for "fuc" will document its
properties as defined in ISO 639-3 and will also include links to
external resources such as Ethnologue that will document its denotation
more fully. If there is more you think is required, please clarify.
> Anyway, these are clearly not easy decisions and I know that in the
> interests of
> "stability" one can't go about undoing and renaming existing codes.
> are matters that will likely prompt (provoke?) more discussion as
> users, webmasters, and localizers come into contact with and attempt
> use the
> standard (lang tagging web content; localization) for languages
> less-represented in computing and cyberspace.
!!! How did we suddenly go from providing a raw data feed to questions
of choices of IDs for particular languages?
> Thanks for any feedback. (One logical suggestion is that this go to
> list - perhaps someone could forward it there and I guess I'll have to
We need to watch that this doesn't go too far off topic for the lists to
which this is addressed.
This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 04:39:44 CDT