On 08/15/2000 12:16:52 AM Doug Ewell wrote:
>Peter Constable <Peter_Constable@sil.org> wrote:
>> - Some other infrastructure pieces are needed to make the picture
>> complete, specifically tagging runs of text to indicate language
>> and/or writing system, and the app pass that info on to the font/
>> rendering system.
>I assume this is an endorsement of Plane 14 language tags.
It wasn't intended to be.
>"some other infrastructure pieces" is just a euphemism for "out-of-band
>markup," and the whole Unicode ideal of requiring only plain-text
>mechanisms in order to encode plain text begins to slip away.
No, it just means that plain text doesn't necessarily contain *everything*
needed to make the text *appear* the way an author might have wanted it to.
That's true of language-particular renderings, such as Cyrillic italic
forms; it's also true of user-selected renderings, such as discretionary
ligatures, or even choice of typeface. Plane 14 tag characters may be used
to encode the language in which the content is expressed, or some other
mechanism can be used. But the value of plain text isn't undermined without
Plain text is simply limited in what it can represent. Strictly speaking,
it isn't even necessarily enough to ensure that a human can read the text
and know what it's supposed to mean. (E.g. what does "chat" mean? The
answer depends upon what language is assumed.) But even if the intended
language is obvious ("Khaw phuut mai chat"), plain text can only encode
what we might call propositional/lexical aspects of meaning. It can't
record some aspects of meaning, such as emphasis, beyond the limitations of
the limited punctuation characters a writing system supports. And it
certainly can't record asthetic elements a typographer might want to add to
a text, such as discretionary ligatures and swashes. Given these
limitations, it shouldn't be surprising if language-specific details of
rendering aren't represented. A plain text file containing Serbian text
without any language identification will be understandable (to someone who
recognises it as Serbian and can read Serbian) even though the
language-specific rendering rules aren't applied; that's a no-brainer since
the plain text file doesn't have any italics, and the language-specific
rendering applies only to italics. Similarly for Turkish (and note that the
plain text file won't have ligatures).
Language identification is necessary for a variety of software processes to
work. Plane 14 tag characters are one way to handle this. But I wasn't
intending to give an endorsement of that mechanism.
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:07 EDT