Re: Error on Language Codes page.

From: Doug Ewell (doug@ewellic.org)
Date: Tue Feb 03 2009 - 22:34:51 CST

Next message: John (Eljay) Love-Jensen: "uppercase and lowercase numbers"

Previous message: Jeroen Ruigrok van der Werven: "Re: Braille, CJK and unicode"
In reply to: verdy_p: "Re: Error on Language Codes page."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"verdy_p" <verdy underscore p at wanadoo dot fr> wrote:

>> Codes that are withdrawn from a standard in the ISO 639 family are
>> not still present in the standard. See the official text file
>> provided by ISO 639-2/RA at:
>>
>> http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
>
> We are speaking of old ISO 639-<<<*** 1 ***>>> codes. Your reference
> to ISO 639-<<<*** 2 ***>>>/RA is completely out of topic here. ISO
> 639-2 was designed with a single code kept, ignoring the legacy codes
> that were still present in the Part 1 of the standard.

Well, if you have access to a copy of ISO 639-1 (originally 639) then
you are ahead of me. I don't have a paper copy and the InfoTerm RA site
is, as you point out, no longer active. I was under the impression that
when ISO 639-2/RA said a code element was withdrawn, they meant it was
withdrawn from 639-1 as well.

> It's true that there were codes that have been deprecated, but in fact
> none of that have been really deleted, because at that time, there
> still existed widely deployed applications that referenced the old
> codes. And I think that these applications still exist today.

I'm sure the applications do exist today. That doesn't mean it is
unthinkable that the code elements could have been removed from the
standard.

People draw too many analogies between ISO 639 and ISO 3166 as it is,
but here I go anyway:

ISO 3166 withdraws code elements. They said so in Newsletter V-12: "In
the case of a withdrawal of ISO 3166-1 code elements these formerly
allocated code elements are reserved in accordance with the principles
for the maintenance of the codes." The code element "CS" is no longer
part of ISO 3166-1. Yet it goes without saying that there were
applications at the time of its withdrawal (2006-09-26) that still
referenced it. If ISO 3166 can remove a code element, it is not
impossible that ISO 639 might do the same.

> The same was true when Part 2 of the standard was published (for
> 3-letter codes): there still existed widely deployed codes used by
> librarians for bibliographic interchange, and they were sometimes
> different from the codes used by application and OS builders or in
> communication standards (like MIME in Email and other related
> "RFC"/"BCP" standards published by the IETF, and SGML initially
> created by content publishers like newspapers and advertising
> agencies, a set of standards partly adapted to the Web as HTML and
> published by CERN and later used by other organisations that joined to
> create the W3 Consortium). So there also exists duplicate codes, but
> the ISO 639-2 standard clearly says that the bibliographic codes are
> needed for compatibility witgh existing best practices adopted and
> deployed since long by librarians. Here also there's an asterisk in
> the published lists of codes.

The ISO 639-2/B and 639-2/T code elements are part of the same standard,
and neither is inherently preferred over the other. Perhaps I'm missing
something by looking in the online code lists, but I don't see any
asterisks; just a (B) and (T) indicating the two codes, in the 20 cases
where they differ.

I don't want to turn this into a children's finger-pointing game
("You're off topic!" "No, YOU'RE off topic!"), but I don't see what the
existence of the equal-status 639-2/B and 639-2/T code sets has to do
with the existence of both "he" and "iw" in 639-1, the second of which,
if not withdrawn, is at least marked as deprecated.

> The asterisk itself is not part of the standard or part of the code.
> It it such referencing a note. However this note was not formaly
> described in Part 1, but it was clearer in Part 2 as it was directly
> indicating which of the two codes is the bibliographic code, the other
> one being the technical code recommended for all applications EXCEPT
> bibliographic codes that have NOT been deprecated.

Perhaps there is a misunderstanding. If there are any asterisks in the
formal 639-1 or 639-2 standards, I'm not talking about them. I'm
talking about the codes marked with an asterisk *on the page at the
Unicode site*, which are either deprecated or withdrawn, but which the
page still recommends over the newer (i.e. post-1989) codes.

> Anyway, even if you look at any part of the ISO 639 standards suite,
> there has always remained a severe ambiguity about which code to use
> when several distinct parts of the standard had to be used
> simultaneously (because of their incompleteness). Only BCP 47 has
> solved these ambiguities by defining effective recommendations for
> best practice, and then allowing the other codes as aliases

Actually, that's not what we did. The code (subtag) for French is "fr"
and only "fr". You cannot use "fre" (B) or "fra" (T) in a BCP
47-conformant application. This rule goes back to RFC 3066, before it
was called BCP 47.

> (the most significant change in BCP 47 has been to abandon the
> exclusive meaning of code for language families or collections, and
> this decision was agreed in ISO 649 Part 5, but is still not aplied in
> older parts 1 and 2 and has no consequence in Part 3)

That won't take effect until draft-4646bis is approved and published as
an RFC, which should take place well before the next visit of Halley's
Comet.

>> You can certainly find older lists, provided by third parties, that
>> differ from the official standard. These lists are available at
>> places like:
>>
>> http://ftp.ics.uci.edu/pub/ietf/http/related/iso639.txt
>
> Unmaintained lists are also not good references. Why do you need to
> cite them? There are TONS of unmaintained copies on the web: they are
> just there to display which subset of the ISO 639 standard is
> supported by these sites (or applications that they describe). As long
> as these lists are not changed there, you can just assume that theser
> applications do not support the newer codes, or have not deprecated
> the older codes.

Go back and read what I wrote again, in context. I was trying to point
out exactly what you said, that these lists are unofficial and
inaccurate and error-prone, and should not be used when official lists
are available.

> The cost of changing codes (notably those used in locale identifiers)
> is really tremendous (and probably much higher than the change that
> occured for the national currencies to Euro if it can affect all
> existing codes without notices). That's why you need stability (and
> stability means that in fact, whatever the ISO 639 standard says, it
> cannot really "delete" a code from a standard, we know that this has
> only the effect of deprecating codes

It's their standard. They can delete codes if they want to.

> Did you even know that Java running in an Hebrew version of Windows
> will not load the Hebrew localized ressources if they use the
> recommended "he" code ("iw" had still to be used at least in Java 5,
> I've not checked in Java 6 if this is still the case), but the current
> model for localized resources in Java is very simplist and can't be
> changed significantly without creating compatibility problems.

Fine. Then the page on the Unicode site can recommend the use of "iw"
**FOR JAVA** or for applications that depend on the Java engine. It
does not need to issue a blanket recommendation "for legacy applications
that cannot manage correctly the new standard code or for classes of
applications for which you are not certain that they can use the new
standard." People who read that will not bother, or may not be able, to
verify that their application is not "legacy" or that their protocol was
written after 1995.

>>> anyway, there will probably be no more alpha-2 code assigned in any
>>> part of ISO 639,
>>
>> While probably true, this has little or no relevance to the rest of
>> the thread.
>
> No, this is in topic. This thread started with the use of alpha-2
> codes in a (old) page maintained by Unicode. Either this page should
> be deleted, or notes should be added in it to specify its status and
> remove the ambiguities, saying that none of these codes are a
> recommendation made by the UTC.

Whether new codes will be added to ISO 639-1 has nothing to do with
whether people should be told to use ISO 639-1 codes that were
deprecated 20 years ago.

> No. 20 years is definitely not old: applications and documents written
> 20 years ago will survive and will maintain compatibility. What is
> long is just their "freshness" or adequation to the current market:
> they have become insufficient, but certainly not old. Almost every
> technical standard you did in computing has survived, the technologies
> have been widely reused and integrated in others that can't live now
> without the old ones on which they were built. We still find
> programmers for COBOL, FORTRAN, C, BASIC, or users of ASCII only, even
> all these were defined in the 1960's. The same is true about most data
> compression algorithms. There are technical standards that have
> survived centuries (think weights and measures: even if the imperial
> measures are no more official international standard, they are still
> mandatory for some domains like maritime navigation and aeronautic.

Gosh, you'd almost think I hadn't been in the software world for the
last 20 years.

Of course there is existing data that uses deprecated codes, and of
course that data doesn't magically disappear or get recoded. Software
should continue to recognize the old codes, and interpret them the same
as the new codes, when possible. (That's why we have "deprecated"
subtags in the BCP 47 Language Subtag Registry.) But that is ENTIRELY
different from telling people to GENERATE the old codes in preference to
the new codes.

> The case of Hangul is not really a problem: there was not even a
> single approved technical standard for use in Korea at that time. even
> if Unicode was starting, there was no clear agreement about the
> approach to use. In fact, even Unicode was not fully in agreement with
> ISO 10646 at that time...

I remember that too. I chose my analogy on purpose.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

Next message: John (Eljay) Love-Jensen: "uppercase and lowercase numbers"
Previous message: Jeroen Ruigrok van der Werven: "Re: Braille, CJK and unicode"
In reply to: verdy_p: "Re: Error on Language Codes page."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Feb 03 2009 - 22:39:08 CST