Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]

From: Gregg Reynolds (unicode@arabink.com)
Date: Tue May 24 2005 - 03:01:52 CDT

  • Next message: Raymond Mercier: "Re: hebrew font conversion"

    Dean Snyder wrote:

    > In fact, that's why I said that transliteration is almost tautologically
    > a loss of glyphic information.
    >
    > Both you and Gregg are completely missing my point. The whole purpose of
    > transliteration is to render characters of one script in another, which
    encodings are not scripts, they're mathematical objects
    > almost by definition, or tautologically, means that there is a loss of
    > glyphic information when one transliterates. In fact, that is arguably
    > the main reason one transliterates - to substitute the glyphic
    > information in the source script with different glyphic information in
    > the destination script.
    Arguably maybe; but I don't think so. I think it's about identity, not
    "glyphic information".
      I gave several examples where glyphic
    > information, in ancient texts, for example, is important information
    > that is not conveyed when those texts are transliterated. Hence the
    > utility of encoding those scripts.

    Well I wouldn't argue against the utility of such an encoding; but
    unfortunately the "transliteration is lossy" argument works against you,
    for a very simple reason:

    *computational models of "characters" encode no "glyphic information"*

    None. Nada. Zipzilchzero. x0041 encodes Latin upper case A; it encodes
    an identity; it does not encode "glyphic information". Not even a set
    of glyphs. It's a theoretical impossibility. (btw Unicode has always
    been a bit confused about this.)

    And it's fairly easy to see this. There is no rule you can find that
    will tell you, for any given image, if it is a member of the set of all
    Latin upper case A glyphs. Pretty much any blob of ink can be construed
    as "A" in the right context. It's also impossible to enumerate all "A"
    glpyhs.

    (Idea for a contest: slap a blob of ink in a random pattern in an
    em-square; a sufficiently creative typeface designer will be able to
    design a latin font in which the blob will be recognizably "A". Free
    beer for a week to the best design.)

    So even if you encode your ancient scripts, you are not protected
    against the kind of lossiness you want to avoid. There's always a font
    and a rendering logic involved. You're lost as soon as you lay finger
    to keyboard and your idea of a glyph is transl(iter)ated into an
    integer. To guarantee correct decoding of a message in the way you
    (seem to) want, you would have to transmit specific glyph images along
    with the encoded message; in which case there's not much point of
    designing an encoding.

    Take a look at Douglas Hofstadter's essays on Metafont in "Metamagical
    Themas" for some fascinating discussion of such stuff.

    -gregg



    This archive was generated by hypermail 2.1.5 : Tue May 24 2005 - 03:02:16 CDT