Re: ISO 639-3 database special entries (was: Questions re ISO-639-1,2,3)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Aug 26 2005 - 08:19:02 CDT

  • Next message: N. Ganesan: "Malayalam numerics - 0, 10, 100 & 100"

    From: "Peter Constable" <petercon@microsoft.com>
    >From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
    >> I also note that the file contains no entries for other reserved
    >> codes:
    >> * Scope=R (Reserved),
    >
    > Again, a completely inappropriate use of Scope. Also, I don't see why
    > the data file should include entries for identifiers that has all of
    > their properties defined in the standard itself.
    >
    >
    >> On the opposite, I see that the ISO 639-3 database keeps entries for
    >> special codes (which seems in opposition with the ISO 639-3 policy
    >> of not encoding collective languages, i.e. Scope="C" used for
    >> language families):$
    >> * Scope=S, for example [mul] and [und] in ISO 639-2 and ISO 639-3;
    >
    > Here, a special value for Scope would be appropriate. (Thanks for
    > bringing this to my attention.)

    Please note that I did not invent the special "S" and "R" values used for
    the Scope field. They are shown on the SIL.org's web pages, even if they are
    absent from the downloadable tab-separated text files:
    * One is the list of individual languages and macro languages, all with
    their unique ISO 639-3 code, name, scope and living status, and optionaly
    and informatively their ISO 639-2/T 3-letter codes or ISO 639-1 2-letter
    codes if they exist; it contains only languages (scope="I") and
    macrolanguages (scope="M"), and does not list knwon aliases, or regional
    dialects.
    * The other contains a one-to-many relation table that maps macrolanguages
    to to languages.

    It is not extremely clear to see the difference of encoding and mapping used
    between:
    (1) macrolanguages and its isolated languages;
    (2) isolated languages and its dialects.
    The definition is quite fuzzy: I first wrote about missing regional dialects
    of French, where some have been encoded as isolated languages, and some
    being considered as dialects of standard French and not encoded; anyway,
    Louisiane French, Cajun and Acadian are really dialects of the same American
    French language that also contains Canadian French (in Quebec and Ontario,
    and that also have their variants and creoles with other native American
    languages). This looks like American French and Canadian French should then
    be encoded, and that "French" (alone) should be considered a macrolanguage
    (at least) or even a collection.

    It looks like the distinction of cases comes from the legacy use of the ISO
    639-1 [fr] code in locale identifiers, meaning that ISO 639-3 [fra] (ISO
    639-2/T [fra] and ISO 639-2/B [fre] and ISO 639-1 [fr]) could not be
    considered a collection. But I see that Arabic for example should be
    considered the same, but it was encoded as a macrolanguage (Scope=M), with
    its variants also encoded as isolated languages (Sope=I). So why doesn't
    French map in ISO 639-1 as a macro-language?



    This archive was generated by hypermail 2.1.5 : Fri Aug 26 2005 - 08:21:03 CDT