RE: Questions re ISO-639-1,2,3

From: Donald Z. Osborn (dzo@bisharat.net)
Date: Mon Aug 22 2005 - 02:00:59 CDT

  • Next message: Erkki Kolehmainen: "Re: Chukchee CYRILLIC EL WITH HOOK?"

    Hi Peter, In answer to your question, I can't speak for the person (and group)
    that is interested in presenting the codes but I imagine that:

    1) It is seen as convenient to have a one-stop site for various information
    relevant to localization. (For my part, when assembling information on a
    language-by-language basis for African language localizers, I thought it useful
    to put relevant ISO-639 codes on the various pages - here I tried to "block and
    copy" to minimize the potential for typos and gave the reference sites. This is
    a little different ans a "raw date feed" probably wouldn't be helpful in this
    instance, but I mention it as an example of a situation where one would want
    codes on one's own site rather than simply a pointer to another site. In this
    case, I think that aggregating codes in this particular way may also raise some
    productive questions, but that's another matter that I'll broach after
    responding to your question.)

    2) The presentation on the sites, notably the LOC one, is utilitarian but not
    very dynamic (not that I have room to criticise on this point, but just an
    observation). To SIL's credit, their presentation offers some different ways to
    present the data, and some downloads (which coincidentally might encourage
    setting up static lists of ISO-639 codes on other pages), but there are gaps
    and there is no search feature for the codes.

    3) ISO-639 data fed from the official sites could facilitate devising a kind of
    relational database linking it to alternate names for languages and perhaps
    groupings of languages.
    3a) Say you were looking for the code for Pulaar. You would have to Ctrl-F
    search the term but would find nothing in ISO-639-1&2. True a knowledgeable
    user would try synonyms, but that puts the burden on the user. Next, let's say
    that s/he's pulled up the entire list at the SIL site and searches there - fine
    they would come up with an ISO-639-3 code for Pulaar but still be ignorant of
    the ISO-639-1&2 codes for "Fulah/Peul" that might actually serve the purpose
    intended by the user. SIL's site does have a presentation by "macrolanguages,"
    but you have to know to look for it. (One might add more to the macrolanguage
    list - or better yet provide accurate raw data feed that would facilitate
    presenting other configurations/combinations.)
    3b) In any event, there must be a lot of examples, but a database set-up
    (facilitated by a feed) could provide synonyms and more relevant info. Either
    we put upon LOC &/or SIL to set up more databases, or let motivated user
    communities do it - the latter is bound to happen to some degree anyway, so why
    not devise a way to make sure that what they're using are not copies of static
    lists with possible error and inevitable datedness.

    I realize that a lot of this is hypothetical and that I've gone a ways out on a
    limb with some remarks. So I guess I should go the distance to suggest what
    others have probably observed well before (and may already be working on), that
    maybe the ISO-639 lists such as they are will need some sort of revisions at
    some point with respect to what languages (dialects) are represented at the
    "language" and macrolanguage levels, and what the relationship among them is.
    The example of Fula/Peul and its variant forms that I mentioned above is an
    interesting case in point - the fundamental unity and evident diversity of the
    language(s) are such that one could imagine the utility of tagging Pulaar as
    ff-fuc - that is Fula-Pulaar, using ISO-639-1 (always the preference over
    ISO-629-2 where there are both, as I understand it from the W3C site) and
    ISO/DIS-639-3, though such nesting of ISO-639-3 I understand not to be
    intended. Further specification by country code would be helpful since the
    orthography in Senegal varies slightly from that in neighboring Mali and
    perhaps Mauritania.

    Anyway, these are clearly not easy decisions and I know that in the interests of
    "stability" one can't go about undoing and renaming existing codes. But these
    are matters that will likely prompt (provoke?) more discussion as various
    users, webmasters, and localizers come into contact with and attempt to use the
    standard (lang tagging web content; localization) for languages currently
    less-represented in computing and cyberspace. I could go on but time is limited
    and this is already steaming off-topic I think.

    Thanks for any feedback. (One logical suggestion is that this go to the ISO-639
    list - perhaps someone could forward it there and I guess I'll have to
    subscribe.)

    Don

    Don Osborn, Ph.D. dzo@bisharat.net
    *Bisharat! A language, technology & development initiative
    *Bisharat! Initiative langues - technologie - développement
    http://www.bisharat.net

    Quoting Peter Constable <petercon@microsoft.com>:

    > > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    > On
    > > Behalf Of Donald Z. Osborn
    >
    >
    > > A follow up question is whether it would be possible for these
    > agencies to
    > > provide something like a "raw database feed" (assuming such a complex
    > > syndication is possible) that would permit other organizations to
    > > incorprate
    > > accurate (and automatically updated, though that would not be often)
    > > information on their sites with the look and feel of their site.
    > >
    > > This question arises because someone is looking to post the lists of
    > ISO-
    > > 639
    > > codes on a new site for localization developers, and I don't think the
    > > alternative of just providing a pointer to the LOC site is attractive.
    >
    > Could you explain why a pointer would not be attractive?
    >
    >
    >
    > > > <rant>
    > > > Several sites have published lists of ISO 639 language identifiers,
    > > > rather than simply providing a link to the official site. While this
    > is
    > > > thought to be helpful, it is extremely unhelpful in that errors get
    > > > introduced or the information gets out of date. Anyone that has done
    > > > this is strongly advised to delete their private list and replace it
    > > > with a pointer to the official site:
    > > > http://www.loc.gov/standards/iso639-2/iso639jac.html
    > > > </rant>
    >
    >
    > Peter Constable
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 02:02:14 CDT