ISO 639-3 database special entries (was: Questions re ISO-639-1,2,3)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Aug 24 2005 - 12:43:49 CDT

  • Next message: Philippe Verdy: "Re: ISO 639-3 beta input form (was: Questions re ISO-639-1,2,3)"

    Another idea:
    ISO 639-2/B codes are still in a standard for bibliographic usages and
    defines additional 3-letter codes. However, given that ISO 639-3 is probalby
    expected to deprecate ISO 639-2/B codes, I can understand that they have
    been forgotten from the downloadable database.

    But as these codes are still assigned in ISO 639-2, ISO-639-3 should reserve
    them.
    This can be done by mapping them with a new "B" Scope id.

    So the tab-separated file should contain additional rows such as:
        (ID="bod", Part2="tib", Part1="bo", Scope="B", Type="L", Name="Tibetan")
        (ID="ces", Part2="cze", Part1="cz", Scope="B", Type="L", Name="Czech")
        (ID="cym", Part2="wel", Part1="cy", Scope="B", Type="L", Name="Welsh")
        (ID="deu", Part2="ger", Part1="de", Scope="B", Type="L", Name="German")
        (ID="eus", Part2="baq", Part1="eu", Scope="B", Type="L", Name="Basque")
        (ID="ell", Part2="gre", Part1="el", Scope="B", Type="L", Name="Greek,
    Modern (1453-)")
        (ID="fre", Part2="fra", Part1="fr", Scope="B", Type="L", Name="French")
        (ID="hrv", Part2="scr", Part1="hr", Scope="B", Type="L",
    Name="Croatian")
        (ID="hye", Part2="arm", Part1="hy", Scope="B", Type="L",
    Name="Armenian")
        (ID="isl", Part2="ice", Part1="is", Scope="B", Type="L",
    Name="Icelandic")
        (ID="kat", Part2="geo", Part1="ka", Scope="B", Type="L",
    Name="Georgian")
        (ID="mri", Part2="mao", Part1="mi", Scope="B", Type="L", Name="Maori")
        (ID="mkd", Part2="mac", Part1="mk", Scope="B", Type="L",
    Name="Macedonian")
        (ID="msa", Part2="may", Part1="ms", Scope="B", Type="L", Name="Malay")
        (ID="mya", Part2="bur", Part1="my", Scope="B", Type="L", Name="Burmese")
        (ID="nld", Part2="dut", Part1="nl", Scope="B", Type="L", Name="Czech")
        (ID="per", Part2="fas", Part1="my", Scope="B", Type="L", Name="Persian")
        (ID="slk", Part2="slo", Part1="sk", Scope="B", Type="L", Name="Slovak")
        (ID="sqi", Part2="alb", Part1="sq", Scope="B", Type="L",
    Name="Albanian")
        (ID="srp", Part2="scc", Part1="sr", Scope="B", Type="L", Name="Serbian")
        (ID="zho", Part2="chi", Part1="zh", Scope="B", Type="L", Name="Chinese")
    where the Part2 field refers to the recommanded technical Part1 ID
    (identical to Part2 code for now), when the scope is B (legacy bibliographic
    code); the interest is that ID remains unique, and the table effectively
    keeps the reserved registration for the bilbiographic codes.

    I also note that the file contains no entries for other reserved codes:
    * Scope=R (Reserved), for example code=[qaa] in ISO 639-2
    probably in the hope that it will only list languages or macrolanguages that
    do have true assigned ISO 639-2 codes (Individual languages, and
    macrolanguages)

    On the opposite, I see that the ISO 639-3 database keeps entries for special
    codes (which seems in opposition with the ISO 639-3 policy of not encoding
    collective languages, i.e. Scope="C" used for language families):
    * Scope=S, for example [mul] and [und] in ISO 639-2 and ISO 639-3;

    So these codes (where Scope="R|C|B") may be in an additional (separate) file
    for completeness (this would not break applications that already use the
    existing ISO 639-3 database). Applications would then choose themselves if
    they wish to ignore this second table, or merge the two tables, or load them
    in separate database tables. What is the ISO 639-3 policy regarding the
    stability of these preexisting bibliographic, reserved and special codes
    (that are also used in informative Unicode database files, and in the CLDR)?



    This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 12:45:41 CDT