Re: Proofreading fonts

From: Gregg Reynolds (
Date: Mon Jul 11 2005 - 20:01:22 CDT

  • Next message: Gregg Reynolds: "[Fwd: Re: Proofreading fonts]"

    Asmus Freytag wrote:
    > At 03:26 PM 7/11/2005, Peter Kirk wrote:
    >> In fact I think Gregg started this thread with a bad example. The two
    >> encodings for a with circumflex are canonically equivalent and so
    >> different encodings of the same data. The cases Gregg really needs to
    >> deal with are when the alternatives are not canonically equivalent but
    >> semantically distinct.

    It was a great example! I just didn't make myself clear. ;) I meant
    it as a graphic design problem, not as a practical problem to be solved.
    > I'm still waiting for an actual (or correctly contrived) example.

    Ok, you asked for it. Here's an example taken from my own little
    speculative semantic encoding design for Arabic. Soon to be inflicted
    on an innocent world.

    The letterform waw U+0648 has at least four distinct functions in
    written Arabic.

    1. waw-rad. latin1 translit: W; phono: consonant /w/; semantics:
    radical; e.g. Wjd وجد /wajada/; shows up in the dictionary under the
    letter waw.

    2. waw-nonrad. latin-1 translit: w; phono: consonant /w/; semantics:
    non-radical; e.g. bwâdr بوادر /bawâdir/; shows up under b-d-r, the waw
    is ignored for (first-level) lexical lookup.

    3. sister of damma. latin-1 translit: û; phono: short vowel /u/;
    semantics: non-lexical (it can change meanings within a lexical
    category, though, e.g. from active to passive voice, etc); e.g. mktûb,
    مكتوب /maktoob/; like damma, does not affect lexical ordering (except
    within subentries under the root k-t-b); mnemonic: called sister of
    damma because it always comes after damma (which may not be written
    explicitly) and denotes a lengthening of the vowel /u/.

    4. lazy waw. latin-1: o; phono: null; semantics: null; e.g. bo's
    بؤس/bu's/ where ' is hamza; purely graphotactic; mnemonic: too lazy to
    bear the burden of phonological or lexical meaning; too lazy to grow the
    tail that would make it look like a real waw.

    Ok, so now we have four different encoding elements. BTW, they don't
    have to map to single codepoints. My scheme maps them to latin-1, for
    the transliteration. They could be mapped to PUA points, or to XML
    elements. In any case, they all have the same typographic denotation,
    namely waw U+0648. But you probably would have a hard time writing
    software that could automatically check spelling/encoding. So you need
    a font with four almost but not quite identical waw glyphs. I think.

    For example, lazy waw might use a small subfixed ring or null sign.


    This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 20:03:22 CDT