Re: Tagging orthographic systems (was: (iso639.186) the Ethnologue)

Date: Wed Sep 13 2000 - 17:51:27 EDT

On 09/13/2000 09:09:12 AM Otto Stolz wrote:

>For many language-specific IT processes involving written language,
>such as spell-checking, hyphenating, transliterating (e. g. to Braille),
>or audible rendering, it is not enough to know which language you are
>dealing with: you also need information about the orthography used.

I *entirely* agree. But let us understand two points:

1. Orthography is not the only paralinguistic notion that IT processes
depend upon.

2. Except in a small number of cases, every category in a list of languages
will map to one or more categories in a list of writing systems (excluding
unwritten languages). In other words, the list of writing systems is a
finer enumeration than the list of languages. What that means is that, in
order to arrive at a comprehensive list of writing systems, you're going to
need a comprehensive list of languages anyway.

>Note that this issue is orthogonal to the country code of RFC 1766.
>E. g., both de-AT, de-CH and de-DE could be either spelled the 1902,
>or the 1996, way. Hence, the spelling subtag, and the country subtag
>should be optional, independend of each other.

I would agree.

>I think, the ethnologue lacks information about variant orthographies.
>(I last looked in it, a couple of months ago.) Both RFC 1766 and
>ISO 639 ignore the issue of variant orthographies.


- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT