Re: ASCII.die.die.die

From: Mark Leisher (mleisher@crl.nmsu.edu)
Date: Sat Sep 16 1995 - 18:38:11 EDT


    Richard> I note that Ethnologue lists an amazing number of
    Richard> languages; more than 5 000 living languages if I recall
    Richard> correctly. I further note that what counts officially as
    Richard> a living language changes: apparently Scots English was
    Richard> recently registered by the E.C. as a "minority language"
    Richard> distinct from British English. I note further that a
    Richard> language may change in relevant ways throughout its
    Richard> history, so as to warrant being considered as a different
    Richard> language: Anglo-Saxon, Middle English, and Modern English
    Richard> are not to be mistaken, and Old French is obviously
    Richard> different from Modern French even to someone with very
    Richard> poor French. Not only does this enlarge the set of
    Richard> language tags considerably, different authors might wish
    Richard> to draw the lines differently.

Once a language is identified, nomenclature (e.g. language name) and
status changes (e.g. living, minority, dead) should be easier to deal
with than re-identifying or unidentifying (as it were) a language.

In short, once a language has an identifier, the identifier should
remain the same, changes in the language name or status
notwithstanding.

An example might be a language X that diverges over time into two
distinct languages X[1] and X[2]. What happens is that two new
language identifiers would come into being, preserving the original to
allow, if for no other reason, description of lineage.

There may be historical situations that preclude maintaining a single
identifier for a language, but these concerns are usually part of the
deliberations of a standardization committee. If these situations are
found, some appropriate change mechanism can be adopted into the
standard.

With current computer capabilities, the language identification
approach can be arranged so that a very large space is available. The
question then becomes how to utilize that space to maintain
flexibility for specialized or unanticipated needs (e.g. redrawing the
lines as you mentioned).

    Richard> Markup information does not have to be in the character
    Richard> stream. You _can_ store a document as a character stream
    Richard> and a parallel markup tree, and in fact doing it that way
    Richard> makes it possible to have several incompatible markup
    Richard> devices for the same base character sequence. There have
    Richard> been word processors based on this idea.

Markup and text can be kept in parallel, but it complicates
interchange. Though not a separation of text and markup, a missing
DTD for an SGML document can cause a certain amount of difficulty.

An implication of text and markup maintained in parallel in an
interchange context is that the lowest common denominator is text sans
markup.
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher "The trick is not gaining the knowledge,
Computing Research Lab but surviving the lessons."
New Mexico State University -- "Svaha," Charles de Lint
Box 30001, Dept. 3CRL
Las Cruces, NM 88003



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT