Re: Questions re ISO-639-1,2,3

From: JFC (Jefsey) Morfin (jefsey@jefsey.com)
Date: Tue Aug 23 2005 - 11:32:45 CDT

Next message: Neelesh Bodas: "Representing 'Halant R' in Marathi"

Previous message: J Andrew Lipscomb: "Re: unicode Digest V5 #208"
In reply to: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Next in thread: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Reply: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe,
The problem in using alpha-3 codes is that they are 3 alpha long. An
IETF Draft, supported by Doug and Peter, proposes a strict variation
of the RFC 3066 ABNF (structured format) where subtags are partly
identified by their size, partly by their relative position. I say
"variation" because - however it includes some additions (which
result in changes in the RFC 3066 ABNF) - it does not want to be an
evolution which would permit much needed other changes (IMHO) and
support innovation, for reasons I will not discuss here. The use of
alpha-3 in that ABNF could be confusing at some stage with other
information, all the more than in internet protocols one must not
consider the case.

This calls for several considerations:

- this Draft wants to make this format the sole format to be used in
the IANA registry. This worryingly leaves only two possibilities if
you are not satisfied with that particular format: to defeat the
Draft, or to build an open alternative to the IANA registry (I was
engaged in also supporting the Draft ABNF as one of the deprecating
propositions, and in working on the necessary distribution and
extension of the IANA system)

- the format lacks several important informations such as the
referent of the language (is it English, Basic English, by which
publisher, using which dictionary, etc.), the context of the exchange
(style, special words, etc.) and the date of standard reference
(which may not be the date of the document, which is often ignore anyway).

- the format is supposed to be multimodal, but only limited script
information (founts are not documented) are supported and no space is
reserved for voice, signs, icons attributes.

- but most of all this proposition does not consider the designated
content in a network relational exchanges perspective. This is a very
important point to designate a language. Languages have never been
made to be identified but to be used. They have been made to permit
face to face relations. They have been extended (distance, audience
and time) through scripts. Today they are broadly extended by far
more complex an evolution than from voice to script. Script have
introduced memory and communication. Communication is totally changed
today as is memory. Scripts are much more complex and changed. The
introduction of the relational services changes the nature of the
exchanges. The languages themselves change of nature as
multilingualism extend the capability of language negotiation and
adaptation, from language to language and therefore within what one
understood as a same language. The number of terms to be used/known
is drastically extended too and as a result leads to various views
(and not version) of a language.

Languages are brain to brain interintelligibility protocols. To want
to describe the language and cultural evolution, which tries to
support the increase of exchanges (number, density, complexity), with
designations of the preceding language era (script), is awkward. It
would be like trying to describe the internet in using a postal
paradigm (I use this because this is, to date, unfortunately the main
problem of the end to end interoperability layer). Like every
protocol, languages have parameters. These parameters can include the
country codes - the interest of a numeric code of some size is its
stability, its multilingualism and its script independence.

Another problem we face in trying to build informations databases
rather than object database (I suggest you consider the ISO 11179
effort - not the result but the area of concern in TC32) is the
versatility of the content. We still live with the idea that we use
"texts". We actually use "architexts" (what is going to produce the
vision/version of the text we use, and more and more the interaction
of our rendering tools). If you say you do not want to consider
computer languages, as the IETF Draft does, you deprive yourself from
the very HTML, XML etc. you want to document: it is an architext and
uses computer [ASCII] language - bravo bisharat!). The same architext
may include successive information related to several countries,
regions, ethnolinguistic zones, etc.... and languages. They will have
to be decoded by an OPES (open pluggable edge service) reader. The
IETF Charter adequately quote the relation with the locale, but the
locale itself is subject to a possibly complex, versatile and
adaptative negotiation and to interrelation with the other systems
the computer is related to.

Trying to manage this information with script/text related concepts,
even in overloading them with a lot of information, would be like
wanting to run on an high-way with a bicycle.

ISO 639 1, 2, 3 are not appropriate to support this. They are however
all what we have, as long as ISO 639-6 is not available. ISO 3166 are
not appropriate, it is however a localisation tool of interest as
being the most used ISO standard. But others like ISO 3166-2, E.164,
X.121, geographical coordinates, etc. are of use. What the IETF Draft
should have provided was an ISO 3166 equivalent adapted to the
Multilingual Internet. This work is still to be done: it has been
unfortunately delayed (I started working on a Draft addressing the
need 13 months ago), but at the same time the (sometimes hot) debate
over the IETF Draft was not a complete waste as it gave some good experience.

But we now have to leave the bicycle in peace and to look for some
good Ferrari/Renault.

jfc

At 10:32 23/08/2005, Philippe Verdy wrote:
>From: "Doug Ewell" <dewell@adelphia.net>
>>ISO 3166-1 alpha-2 and alpha-3 code elements are almost identical in
>>their stability (or lack thereof). I can find no instances in the
>>31-year history of ISO 3166 where an alpha-3 code element was changed
>>while the corresponding alpha-2 code was left unchanged. (If you can
>>find one, please accept my apologies.)
>
>Yes alpha-3 codes can change for a country, but in fact alpha-3
>codes have still not been reassigned to different countries, unlike
>alpha-2 codes. So changes of alha-3 codes just changes the old
>official code into an alias.
>
>For example ROM changed to ROU, but ROM was not reassigned to another country.
>
>The reassignments of alpha-2 codes to different countries is the
>main problem for use in locale codes that require longer stability
>than dated statistics.
>
>What this means is that the alpha-2 codes need to be dated to be
>disambiguated.
>
>>The numeric code elements (henceforth "codes"), which are really UN
>>codes rather than ISO codes
>
>That's what I said (UNSD means United Nations' Statistics Division
>if this was not clear)
>
>>are usually considered more stable, but it
>>depends on what kind of stability you are looking for. ISO alpha codes
>>change when the name of a country changes (or whenever the country feels
>>like changing it; see Romania). UN numeric codes change when the
>>territory covered by the code changes. Normally the latter event is
>>less frequent than the former, but the reverse can also happen; in 1993,
>>the numeric code for Ethiopia changed from 230 to 231 (because of the
>>loss of territory to Eritrea) while the alpha codes remained ET and ETH.
>
>OK, but 230 has *still* not been reassigned (it could easily, given
>the much smaller encoding space for numeric codes which are
>geographically structured), so it has become an alias for Ethiopia
>(such alias would remain valid for references to documents speaking
>about the country before the split, or composed with localization
>meta-data; of course documents speaking about the country after the
>split should use the new code, to avoid the ambiguity with Erithrea,
>but this would not invalidate the past references; but this would be
>true for any country code, including the CIO 3-letter country codes,
>or other standards).
>
>My opinion is that the UNDS wants to keep the possibility to make
>historical searches in its data, without mixing in the same result
>list the statistics of unrelated countries or territories. This is
>however less a problem for UN, given that statistics are necessarily
>dated (this is not the case for many documents needing locale code
>markup or meta-data).
>
>
>
>

Next message: Neelesh Bodas: "Representing 'Halant R' in Marathi"
Previous message: J Andrew Lipscomb: "Re: unicode Digest V5 #208"
In reply to: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Next in thread: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Reply: Philippe Verdy: "Re: Questions re ISO-639-1,2,3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Aug 23 2005 - 21:49:30 CDT