Mark> Subject: RE>>Embedded language ID proposal Time: 5:42 PM
    Mark> Date: 9/8/95
    Mark> I am still unconvinced of the need to have language
    Mark> information in plain text; there are legitimate needs for
    Mark> that information, but there are needs for other particular
    Mark> attributes that go along with rich text, and it is hard to
    Mark> see why this one should be singled out.
For the most part I personally agree that language identifiers would
seem most logically markup.
But from a multilingual natural language processing perspective (and
perhaps others), having a single codeset with embedded language
identifier capability would provide an attractive reference text
representation.
Had the proposal not provided any utility for areas other than ours, I
doubt we would have bothered to present it other than as an
idiosyncrasy of our particular Unicode support implementation.
    Mark> In terms of commenting on these particular suggested private
    Mark> use implementations, the string scheme (LANG_ID_START text
    Mark> LANG_ID_END) has the very considerable drawback of
    Mark> introducing fr_FRgarbageen_US into data streams that don't
    Mark> recognize LANG_ID_START, LANG_ID_END. Using independent
    Mark> private use characters exclusively at least allows other
    Mark> implementations to filter them out without knowing
    Mark> bracketing semantics.
Telling point.  I hadn't thought of that.
    Mark> As far as terminology goes, these are not combining
    Mark> characters: they are not positioned relative to a preceding
    Mark> base character; they are not positioned at all! They are
    Mark> more akin to the formatting characters such as RLM or ZWJ.
Our initial conclusion as well.
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher                         "The trick is not gaining the knowledge,
Computing Research Lab                    but surviving the lessons."
New Mexico State University                  -- "Svaha," Charles de Lint
Box 30001, Dept. 3CRL
Las Cruces, NM  88003
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT