RE: Questions re ISO-639-1,2,3

From: Peter Constable (petercon@microsoft.com)
Date: Mon Aug 22 2005 - 04:38:57 CDT

  • Next message: Alexej Kryukov: "Re: Historical Cyrillic in Unicode"

    > From: Donald Z. Osborn [mailto:dzo@bisharat.net]

    > 1) It is seen as convenient to have a one-stop site for various
    > information
    > relevant to localization. (For my part, when assembling information on
    a
    > language-by-language basis for African language localizers, I thought
    it
    > useful
    > to put relevant ISO-639 codes on the various pages

    I have no problem with citing ISO 639 IDs for particular languages: that
    is something that we expect to be stable. It's quite another thing,
    however, if we're talking about a general listing of language
    identifiers. For the latter, I feel that people should refer to
    definitive sources: the official source or an approved mirror.

    > 2) The presentation on the sites, notably the LOC one, is utilitarian
    but
    > not
    > very dynamic (not that I have room to criticise on this point, but
    just an
    > observation). To SIL's credit, their presentation offers some
    different
    > ways to
    > present the data, and some downloads (which coincidentally might
    encourage
    > setting up static lists of ISO-639 codes on other pages), but there
    are
    > gaps
    > and there is no search feature for the codes.

    Could you give me an example of what you're referring to by "gaps"?

    A UI for searching could certainly be considered. I suspect that the RA
    for 639-3 will be open to considering additional UI features to meet
    user needs.

    > 3) ISO-639 data fed from the official sites could facilitate devising
    a
    > kind of
    > relational database linking it to alternate names for languages and
    > perhaps
    > groupings of languages.

    It would seem to me that such a relational database will either rely on
    an internal data table, in which case a downloadable file is what is
    wanted, or URIs pointing to record for particular languages that are
    available on the Internet. If that's what you're referring to, then the
    ISO 639-3 RA will provide that. But that does not imply a need for other
    sites to present mirrors or duplicates of the 639 code tables.

    > 3a) Say you were looking for the code for Pulaar...

    I think the idea of a search facility that can use alternate names such
    as those listed in Ethnologue would be a great idea. Of course, if the
    list of alternate names is incomplete, the user won't find what they're
    looking for, and it's unlikely a list could ever be complete.

    > they would come up with an ISO-639-3 code for Pulaar but still be
    ignorant
    > of
    > the ISO-639-1&2 codes for "Fulah/Peul" that might actually serve the
    > purpose
    > intended by the user.

    If you're referring to the online version of Ethnologue, I suspect that
    once the 639-3 is launched the language descriptions on the Ethnologue
    site will provide hotlinks to information for that language on the ISO
    639-3 site (and that would include association with macrolanguage
    categories).

    > or better yet provide accurate raw data feed

    So, what do you consider a "raw data feed"? A URL of the form

    http://www.sil.org/iso639%2D3/documentation.asp?id=aaa

    will return data pertaining to the identifier "aaa". That particular URL
    will return data in HTML format; I'm guessing by "raw data" you want
    something other than HTML. You want plain text? Some kind of XML record?
    Such things certain can be considered, though I can't speak for the
    639-3 RA regarding their openness to doing that.

    > maybe the ISO-639 lists such as they are will need some sort of
    revisions
    > at
    > some point with respect to what languages (dialects) are represented
    at
    > the
    > "language" and macrolanguage levels, and what the relationship among
    them
    > is.
    > The example of Fula/Peul and its variant forms that I mentioned above
    is
    > an
    > interesting case in point - the fundamental unity and evident
    diversity of
    > the
    > language(s) are such that one could imagine the utility of tagging
    Pulaar
    > as
    > ff-fuc - that is Fula-Pulaar, using ISO-639-1 (always the preference
    over
    > ISO-629-2 where there are both, as I understand it from the W3C site)
    and
    > ISO/DIS-639-3, though such nesting of ISO-639-3 I understand not to be
    > intended. Further specification by country code would be helpful since
    the
    > orthography in Senegal varies slightly from that in neighboring Mali
    and
    > perhaps Mauritania.

    It's not clear to me what you feel is lacking here. The 639-3 site will
    tell you that the category "ff"/"ful" is a macrolanguage, and what is
    the list of its encompassed individual languages, which list will
    include the category "fuc". The record for "fuc" will document its
    properties as defined in ISO 639-3 and will also include links to
    external resources such as Ethnologue that will document its denotation
    more fully. If there is more you think is required, please clarify.

    > Anyway, these are clearly not easy decisions and I know that in the
    > interests of
    > "stability" one can't go about undoing and renaming existing codes.
    But
    > these
    > are matters that will likely prompt (provoke?) more discussion as
    various
    > users, webmasters, and localizers come into contact with and attempt
    to
    > use the
    > standard (lang tagging web content; localization) for languages
    currently
    > less-represented in computing and cyberspace.

    !!! How did we suddenly go from providing a raw data feed to questions
    of choices of IDs for particular languages?

    > Thanks for any feedback. (One logical suggestion is that this go to
    the
    > ISO-639
    > list - perhaps someone could forward it there and I guess I'll have to
    > subscribe.)

    We need to watch that this doesn't go too far off topic for the lists to
    which this is addressed.

    Peter Constable



    This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 04:39:44 CDT