About names again...

From: Alain LaBonté  (alb@sct.gouv.qc.ca)
Date: Tue Jan 06 1998 - 08:57:24 EST

Re: Re: short Unicode names?

An interesting (and, I believe, relevant for the current UNICODErs
discusion) contribution on stable and culturally neutral identifiers
follows. The same should apply to character codes, and we indeed have
stable and normative identifiers in ISO/IEC 10646, which are numeric... Did
anybody ever complain about using phone numbers or catalog numbers to order
goods? Why should it be otherwise when only machines are involved? For
humans, it is always possible to have a user interface translating codes to
names *in the user's language* and in many cases, just a glyph would do,
like said Murray Sargant last Sunday. But, "de grâce" (!), don't use names
for machine interchange and conversion. It is inefficient, a source of
confusion, and I would say it should even be considered "ultra vires".

Alain LaBonté
From: "Jake V. Knoppers" <jk0@istar.ca>
To: "Alain LaBonté  " <alb@sct.gouv.qc.ca>,
        "'V.S. Umamaheswaran'" <dorai@VNET.IBM.COM>,
        "Winkler, Arnold F" <Arnold.Winkler@unisys.com>
Cc: "Infoman Inc" <mpereira@istar.ca>, "Nelida Chan" <nchan@yorku.ca>,
        "Edna Hussman" <hussmae@gov.on.ca>, <Christine.Leonhardt@tpsgc.gc.ca>,
        "Diane Michaud" <Diane.Michaud@pwgsc.gc.ca>,
        "Michael Everson" <everson@indigo.ie>
Subject: Re: FW: Culture
Date: Mon, 5 Jan 1998 16:52:32 -0500
X-MSMail-Priority: Normal

This is a reply from Jake Knoppers.

If two letter mnemonics are not sufficient to identify language codes and I
think that this is so, then I suugest that we go to four(4) digit numeric
codes. There are a number of reasons for this. They are presented in no
particular order.

1. The three base standards for global electronic commerce, global
electronic administration, any data element or document based interchange are
> language codes [LAC] ISO 630
> country codes [COC] ISO 3166-1
> currency codes [CUC] ISo 4217
i.e. as noted in the CWA N021 contribution.

In CAW N021, I state the case for using the 3 digit numeric as the common
interchange value for country codes among application interfaces since it
is (1) the most stable, i.e. it does not change unless the physical entity
referenced changes unlike the two and three alpha codes ; and (2) neither
LAC nor CUC use three digit numerics as their primary code.

Further, I argued that the three alpha code be reserved for CUC since it is
its primary reference and unique refrence for codes respresenting
currencies and funds.

Finally, I also argued that in this context the two-letter alpha be
utilized as the primary identification schema for LAC.

The rationale here was two-fold namely unambiguity in the interface and
efficiency/effectiveness in the the interface among heterogenous applcations.

2. Given the response to CAW N021, I suggest the following, namely the
development of the following strategy from (1) an IT enablement perspective
as well as that of unambiguous identification and referencing in global
electronic communications, commerce, administration, etc. ; (2) the
perspective that individuals around the world will form conventions in
using the Internet in communication in communicating in various languages
whether "officially" recognized or not , e.g. pig-latin, klingdon, etc.;
and, (3) the fact that currency codes are linked to three-alpha mnemonics

(a) to have the CAW pass a resolution recommending that

(i)the ISO 639 convert to a four-digit numeric as the primary and
unambiguous identification for languages codes

(ii) that ISO 10646 serve as the source of the repetoire of any combination
of characters/symbols, etc. i.e. as a set for referencing ISO 639 language

(iii) that the 0000-1999 ISO 639 language codes series be reserved for
those languages which respresent those character sets/symbols/notations
,i.e. as an officially approved user profile of ISO 10646, officially
recognized for use by countries within their physical boundaries as
represented by/linked to the three-digit numeric country codes as found in
ISO-3166 [COC] as well as needed characters/symbols for the associated
applicable currencies as found in ISO 4127[CUC]

(iv) that the 2000-3999 ISO language code series block be reserved for
"languages" to be reigistered by linguists (via their professional
associations) for languages and associated character sets which either are
(1) considered "dialects" from a 0000-1999 perspective; and/or (2) no
longer in use but once, in the past. forming part of languages/dialects
which while not meeting the criteria of the 0000-1999 series are written
languages the mapping of which can be supported through ISO 10464

(v) the the 4000-5999 ISO language code series block numeric be reserved
for representing mapping/user profiles of ISO /IEC 10464 in support of
formal scientific and technical languages, i.e. special languages according
to TC37, via their (internationally recognized) professional associations.

(vi) the 6000-6999 ISO language code series block numeric be rersserved for
use by other "formally" internationally recognized associations

(vii) the 9000-9999 ISO language code series block be resaerved for user
extensions usch as "pig-latin, klingdon, esperanto, etc. etc.

(b) to establish a registration authority for the assignment of unique ISO
639 ICDs, international code designators along with the criteria and
process for registration. Here the EDIRA model might be useful, i.e. EDI
Registration Authority, for the unambiguous and unique registration of
organizations world-wide might be usefull. EDIRA serve as a front-end
filter to ISO 6523

(3) This approach will ensure that in global interchange one will not
confuse country codes with currency codes and/or with language codes as is
currently the case. This approach also support the following

> with respect to language codes, being able to support "official", i.e. de
jure recognized character sets of countries, i.e. "nation-states"

>abilility to support special languages, i.e. scienticfic and technical,
many of which are also recognized de jure and professionally world-wide

>ability to support written languages from a historical/research perspctive

> ability to support current in vogue language (e.g. pig-latin, esperanto,
kligndon, etc.)

In concluding this note, one recognizes that
(1) there are sunk investments associated with the present ISO 639 and the
Library of Congress language codes
(2) there is confusion between two-letter language codes[LAC] and two
letter country codes [LOC]
(3) present ISIO 639 is inadequate, needs a major overhaul, was developed
mcuh in time prior to ISO/IEC10646 related needs were indetified and IT
(4) moving to a three-letter ISO 639 code not only cause linguistic
problems (e.g. PET) but also confusion with currency codes [CUC]
(5) moving to a four-digit language codes will cause some migration
problems but is the best solution from both a user needs requirement and
IT-enablement perspectives requirements perspectives.

Enough said for now. I look forward to your comments. Based on those
received, I am willing to prepared an additional contribution to CAW on
this. What do you say? What do you think?

Jake Knoppers - e-mail: <mpereira@istar.ca>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT