Re: Conflicting principles

From: John Cowan (cowan@mercury.ccil.org)
Date: Wed Aug 06 2003 - 22:05:49 EDT

  • Next message: Kenneth Whistler: "Re: Questions on ZWNBS - for line initial holam plus alef"

    Kenneth Whistler scripsit:

    > Is a right-to-left script encoded in visual order in
    > the backing store or in phonetic (= logical) order?

    I've always thought this term "visual order" was productive of
    nothing but confusion. I realize that there's precedent in the
    8859-x RFCs for its use, but what it really means is:

         Is a right-to-left script encoded from left to right in
         the backing store or from right to left?

    So put, the question answers itself.

    > This should, IMO, be put up on a pedestal and have the spotlights
    > shined on it. This is the *fundamental* obligation of a character
    > encoding standard. If you cannot accomplish this, then you just
    > have a bunch of charts full of pretty pictures, and everyone is
    > on their own for trying to figure out how to communicate with
    > anybody else using them.

    And AFAIK no other character encoding standard satisfies that obligation.
    The rest of them are all charts with pictures, names, and numbers.

    > The reason why the UTC should tackle the encoding of Tengwar
    > is not so much because it would help in the publication of Elvish
    > poetry, but because confronting the architectural issues
    > it poses for encoding would make an excellent tutorial case
    > for how the two principles of combining mark order and
    > logical order impact the task of coming up with an appropriate
    > encoding for a complex script.

    Indeed.

    > And it would starkly illustrate
    > the fact that an appropriate character encoding does not
    > necessarily directly reflect the phonological structure of
    > a language as represented by that script.

    "Not necessarily" is the operative word. The question is whether that
    failure to reflect is tolerable. At present, three possibilities have
    been kicked about:

    1) Encode the vowel signs as combining characters, and therefore after
       the base characters over/under which they appear, logical order be
       damned.

    2) Encode the vowel signs as base characters explicitly ligatured with
       ZWJ to the characters over/under which they appear, in logical order.

    3) Encode the vowel signs as base characters implicitly ligatured by
       the font ligaturing table to the base characters over/under which they
       appear, in logical order. This alternative requires the use of
       distinct ligaturing tables (perhaps distinct fonts) for different
       modes of use.

    -- 
    You are a child of the universe no less         John Cowan
    than the trees and all other acyclic            http://www.reutershealth.com
    graphs; you have a right to be here.            http://www.ccil.org/~cowan
      --DeXiderata by Sean McGrath                  jcowan@reutershealth.com
    


    This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 22:49:08 EDT