Am 2000-09-12 um 17:43 h UCT hat Peter Constable geschrieben:
> ISO 639 codes were primarily intended for bibliography purposes.
> Gary and I point out in our paper that the needs of that sector do
> not necessarily correspond to the general needs of IT, particularly
> for language-specific processing. [...] For example, if all you know
> about the language of some information object is that it is an Athapascan
> language, you can't spell-check that information. The intro to ISO 639
> claims that the standard is intending to serve the needs of a variety of
> sectors, but in its current state it is failing to adequately serve some.
> Furthermore, we would contend that the categories enumerated in the
> Ethnologue by-and-large *are* the categories that need to be identified for
> general IT purposes. In the majority of cases, the distinctions made are
> those that would be needed to successfully spell-check, for example. (We
> acknowledge that that is not true in all cases; for example, Chinese
> spelling would cross multiple languages; and alternate English spellings
> are needed for what would generally be considered one language. But these
> are the exceptions, not the norm.)
For many language-specific IT processes involving written language,
such as spell-checking, hyphenating, transliterating (e. g. to Braille),
or audible rendering, it is not enough to know which language you are
dealing with: you also need information about the orthography used.
Orthography is subject to change over time, sometimes several orthograhies
for the same language co-exist, e. g. in transition time-spans or in
- German orthography has been reformed in 1996; currently, two ortho-
graphies are legal (e. g. accepted in school assignments): the old
one, established in 1902, until 2005-07-31, and the new one, effective
since 1998-08-01; cf. (in German)
<http://www.ids-mannheim.de/reform/zeitafel.html> (time schedule),
and <http://www.ids-mannheim.de/grammis/reform/inhalt.html> (rules);
- France had an orthographic reform for French, in 1991;
- the Dutch spelling-reform of 1934 was enacted 1943 in Belgium,
and 1947 in the Netherlands; Dutsch spelling was again (marginally)
reformed in 1995, effective since 1996-08-01;
- Norwegian spelling was reformed in 1907, 1917, and 1938;
- Danish in 1948;
- Spanish in 1910, and again in 1852/55;
- Greek in 1982;
to name just a few. The co-existence of en_US and en_UK has already been
mentioned, im this thread.
Hence, I plead for a tagging-system that allows to represent these dif-
ferences. Currently, all of my WWW pages contain the line:
<HTML LANG=de><!--neue Rechtschreibung-->
I would rather prefer to incorporate the comment in the tag, as in
and likewise for other languages, and other applications.
Note that this issue is orthogonal to the country code of RFC 1766.
E. g., both de-AT, de-CH and de-DE could be either spelled the 1902,
or the 1996, way. Hence, the spelling subtag, and the country subtag
should be optional, independend of each other.
I think, the ethnologue lacks information about variant orthographies.
(I last looked in it, a couple of months ago.) Both RFC 1766 and
ISO 639 ignore the issue of variant orthographies.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT