Re: Defined Private Use was: SSP default ignorable characters

From: John Cowan (cowan@ccil.org)
Date: Wed Apr 28 2004 - 07:37:01 EDT

  • Next message: Andrew C. West: "Re: Romanian and Cyrillic"

    Doug Ewell scripsit:

    > Second, you are correct that the Plane 14 tagging mechanism has met
    > with, to say the least, a lack of acceptance. I warned in 2002 that
    > deprecation or even "strong discouragement" of the Plane 14 language
    > tags would be equivalent to implicit deprecation of the entire Plane 14
    > tag concept. As it turns out, a lot of people think that's a good idea.

    It's a hack and a fuggly hack, but not as obnoxious as the hack it
    replaced -- abusing UTF-8 to carry language-tag information in invalid
    code sequences. The intended domain, human-readable error messages
    in low-level protocols, is in any case very small.

    > No, the "d" stands for Decimal. This category is deliberately limited
    > to characters that can be concatenated to form numbers in a base-10
    > positional number system. It's a fact of life that base-12 and base-16
    > digits are relegated to category No.

    +1

    The serious issue with Tengwar numbers is how they should be represented
    internall for either decimal or duodecimal digit-strings. Tengwar is
    LTR, but writes the least significant digit first and leftmost. Should
    this be followed internally, or should an attempt be made to represent
    the most significant digit first (as in all other known scripts) by
    creative abuse of the bidi properties?

    The other and unrelated problem with Tengwar is how to encode the
    vowel signs. The trouble here is that in some orthographies of
    some languages (notably English), the vowel signs represent
    vowels that phonetically *precede* the consonant they are written on.
    Like the digits, this is unprecedented. Should phonetic ordering
    be ignored in those cases? Or should the vowel signs not be
    treated as combining marks? Or do we need two sets of vowel signs?
    (The related script Sarati does have two visible sets of vowel signs
    distinguished by position.)

    > E06A;TENGWAR DUODECIMAL DIGIT TEN;No;0;L;;;10;10;N;;;;;

    +1

    -- 
    A: "Spiro conjectures Ex-Lax."                  John Cowan
    Q: "What does Pat Nixon frost her cakes with?"  jcowan@reutershealth.com
      --"Jeopardy" for generative semanticists      http://www.ccil.org/~cowan
    


    This archive was generated by hypermail 2.1.5 : Wed Apr 28 2004 - 08:09:57 EDT