RE: transforms and language identifiers (was Re: Dozenal chars in music)

From: Peter Constable (petercon@microsoft.com)
Date: Tue May 26 2009 - 10:15:37 CDT

  • Next message: William_J_G Overington: "Firefox is good for testing OpenType standard ligatures (derives from Re: Pb with Unicode Tifinagh with Internet Explorer)"

    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Julian Bradfield

    > I admit that I'd never heard of BCPs
    > - I'm sure more people know about Ethnologue than BCP 47, by some
    > orders of magnitude,

    Maybe; but even Ethnologue isn't a household word. I'm not sure the difference is that significant.

    > so I view best current practice as following
    > Ethnologue.

    Best practice _for what_? (The statement "best current practice is following Ethnologue" can't be evaluated without additional context.)

    If the context is one of determining the denotation of some language coding scheme, then one must refer first to where the coding scheme is defined. If someone uses a coding "en-ipa", that doesn't implicitly tell us that they mean English at all: they might mean the En language of Vietnam, for all we know. Of course, using "en" to represent "English", is a very common practice, but very often the intended denotation is not documented further than that -- which means we can guess that "English" is meant, but we don't know anything of what varieties of English were meant to be encompassed. What was intended, however, is still determined by creator of that usage, not by some external fiat.

    A very widely used language coding scheme is BCP 47. It indicates that 2- and 3-letter primary language subtags (e.g. "en") "were defined in the IANA registry [the collection of all the subtags used by this scheme] according to the assignments found in the standard ISO 639...". The current version of BCP 47 references ISO 639-1 and ISO 639-2; it is about to be updated to reference also ISO 639-3 and ISO 639-5. What's interesting in all of this is that both the IANA registry and ISO 639-1/-2 document the denotation of "en" simply as "English". ISO 639-3 does add further informative data, however, and will reference other sources to indicate the denotation. For modern languages, it refers to Ethnologue.

    Peter



    This archive was generated by hypermail 2.1.5 : Tue May 26 2009 - 10:17:59 CDT