Re: VS vs. P14 (was Re: Indic Devanagari Query)

From: Jim Allan (
Date: Wed Feb 05 2003 - 16:47:20 EST

  • Next message: Michael Everson: "Re: VS vs. P14 (was Re: Indic Devanagari Query)"

    James Kass posted:

    > The advantages of using P14 tags (...equals lang IDs mark-up) is
    > that runs of text could be tagged *in a standard fashion* and
    > preserved in plain-text.

    But this still would not necessarily handle orthographic variations.

    See Peter Constable's discussion of language classifcation and
    orhographic classification at

    Currently standard language tagging or orthographic tagging that is
    logically no more than a kludge once it tries to go beyond obvious
    different languages that are unintelligible to users of other languages.

    Which language tag protocol should Unicode adopt? Should it create its
    own? That last seems beyond the mandate of Unicode.

    There are often conflicting orthographic usages within a language.
    Language tagging alone does not indicate whether German text is to be
    rendered in Roman or Fraktur, whether Gaelic text is to be rendered in
    Roman or Uncial, and if Uncial, a modern Uncial or more traditional
    Uncial, whether English text is in Roman or Morse Code or Braille.

    Capital Eng is found in both pointed and rounded forms in Sami texts and
    printed names, so far as I have read.

    The pointed Eng is more common.

    Does that mean it is "preferred" or only that it happens to be the more
    common form in available fonts?

    Perhaps the rounded Eng is actually "peferred" by most.

    Perhaps most don't care at all, any more than they care whether the hook
    on a _J_ descends below the baseline, whether the descender on _g_ is
    open or closed, whether _a_ is rendered with an upper curl or not.

    Certainly language tagging shouldn't be used to distinguish between such
    forms, unless specifically requested by organizations that can show that
    their request is supported by a very large proportion of the users of
    the language.

    But even then, do not those who disagree have the right to dissent, to
    push their own desires in spelling or orthography?

    Language tagging and orthography tagging is not all that is needed.

    One sometimes *needs* to show emphasis, for example in a database of
    books and articles one may need to catalogue titles like "Comments on
    the _Tao_Te_Ching_" (see

    To be correct, the book title *must* be italicized, unless the article
    title appears in italicized text, in which case it should be non-italic
    to contrast.

    Titles of articles in mathematics or chemistry may contain superscript
    and subscript characters beyond those hard-coded in Unicode.

    These cannot be indexed in a database as plain text.

    Plain text is not adequate for *so much* normal use. But who ever
    claimed it was? Plain text is only the underlying text, which is
    sometimes, alone, sufficient.

    At the moment XML seems to be the mark-up protocol towards which most
    are moving, and there seems to be no point in duplicating its features
    in Unicode, unless Unicode can somehow do it better.

    Jim Allan

    This archive was generated by hypermail 2.1.5 : Wed Feb 05 2003 - 17:34:03 EST