Richard> I note that Ethnologue lists an amazing number of
Richard> languages; more than 5 000 living languages if I recall
Richard> correctly. I further note that what counts officially as
Richard> a living language changes: apparently Scots English was
Richard> recently registered by the E.C. as a "minority language"
Richard> distinct from British English. I note further that a
Richard> language may change in relevant ways throughout its
Richard> history, so as to warrant being considered as a different
Richard> language: Anglo-Saxon, Middle English, and Modern English
Richard> are not to be mistaken, and Old French is obviously
Richard> different from Modern French even to someone with very
Richard> poor French. Not only does this enlarge the set of
Richard> language tags considerably, different authors might wish
Richard> to draw the lines differently.
Once a language is identified, nomenclature (e.g. language name) and
status changes (e.g. living, minority, dead) should be easier to deal
with than re-identifying or unidentifying (as it were) a language.
In short, once a language has an identifier, the identifier should
remain the same, changes in the language name or status
An example might be a language X that diverges over time into two
distinct languages X and X. What happens is that two new
language identifiers would come into being, preserving the original to
allow, if for no other reason, description of lineage.
There may be historical situations that preclude maintaining a single
identifier for a language, but these concerns are usually part of the
deliberations of a standardization committee. If these situations are
found, some appropriate change mechanism can be adopted into the
With current computer capabilities, the language identification
approach can be arranged so that a very large space is available. The
question then becomes how to utilize that space to maintain
flexibility for specialized or unanticipated needs (e.g. redrawing the
lines as you mentioned).
Richard> Markup information does not have to be in the character
Richard> stream. You _can_ store a document as a character stream
Richard> and a parallel markup tree, and in fact doing it that way
Richard> makes it possible to have several incompatible markup
Richard> devices for the same base character sequence. There have
Richard> been word processors based on this idea.
Markup and text can be kept in parallel, but it complicates
interchange. Though not a separation of text and markup, a missing
DTD for an SGML document can cause a certain amount of difficulty.
An implication of text and markup maintained in parallel in an
interchange context is that the lowest common denominator is text sans
Mark Leisher "The trick is not gaining the knowledge,
Computing Research Lab but surviving the lessons."
New Mexico State University -- "Svaha," Charles de Lint
Box 30001, Dept. 3CRL
Las Cruces, NM 88003
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT