Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]

From: Nick Nicholas (opoudjis@optushome.com.au)
Date: Tue May 24 2005 - 02:54:28 CDT

  • Next message: Gregg Reynolds: "Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]"

    from Dean Snyder:

    > I gave several examples where glyphic
    > information, in ancient texts, for example, is important information
    > that is not conveyed when those texts are transliterated. Hence the
    > utility of encoding those scripts.
    >
    >

    I'm sorry, but the applications you've been talking about --- glyph-
    based recognition, glyph-based restoration, caring that r and z in
    Arabic differ by a dot --- require a glyph inventory. Not only
    *cannot* Unicode provide you with that, it *must not*. Otherwise,
    consider Italic lowercase Latin a: it is easily confusable with o,
    wheras upright normal Latin a is not. This similarity is obviously
    important for palaeographers or graphologists or whatever. Which
    means --- what, that italic and upright a are to be disunified as
    codepoints? (Mutatis mutandis, exactly the same goes for Serbian vs.
    Russian italic Te. And noone bring up Latin Small Letter Alpha,
    that's not used outside phonetic transcription.) If Roman-script
    palaeographers can put up with two thousand years of ductus being
    mooshed together into 52 codepoints, then cuneiformists can put up
    with a cuneiform being encoded on emic rather than etic principles
    too. (Which is what I always meant by "Don't Prolif, Translit": the
    emic repertoires are almost always cemented in transliteration, not
    in ahistorical normalisation of the historical script.)

    You've complained that transliteration is lossy. (Nice countering of
    slogans, btw.) But at the level of glyph identity, so is going from
    italics to upright. So is dropping language tagging of Serbian vs.
    Russian. At the level of how deep the stylus impressions in the
    tablet go, so are 2-D photographs, for that matter. The lossiness is
    a given in any change of medium, or normalisation of glyphs, or
    indeed any encoding at all. And specifically to what Unicode was
    designed for, it's why *plain*text is not richtext. That does not
    prove that the distinctions you may want to make are relevant to
    plaintext; in fact, the more you speak of glyphs, the more it proves
    the opposite. (Dots? There are no dots in a hex number.)
    Transliteration is lossy; so is the character-glyph model. And that's
    a *good* thing: I like being able to use "Find" on text, thank you.
    You're to be lauded in envisaging ways a computer-driven glyph-
    recognition system can revolutionise cuneiform studies. But that
    cannot be not Unicode's concern: Unicode has to provide an emic
    repertoire of codepoints, whensoever possible.

    To make myself clear: I don't oppose the Unicode encoding of
    cuneiform --- more power to you. But where plaintext use of a script
    is limited (and a lot of ancient script use is not obviously
    plaintext), encoding that script is a much less pressing need: that's
    a fact, and it's a fact because of the institutionalised preference
    for transliteration of historical scripts. And I do object to the
    risk of a script encoding ignoring the need to establish characters
    out of glyphs, and making Unicode an openended glyph storehouse.
    Where this can be avoided, it should. Where this means script
    proposals need to be held up in discussion by whatever scholars UTC/
    ISO brings over, it should. There's been some belittling of those
    scholars of such scholars in similar debates on this list and
    elsewhere (the "spoilsport" reaction I refer to on my site); but for
    all that David Starner doesn't "find the concerns of the academics
    interesting" :-( , those academics have a crucial stake in preventing
    poor encodings of their subject area, especially with the door shut
    on canonical equivalences. (Yes, you can customise DUCET, but that's
    patching.) It continues to baffle me that this is even arguable.

    ===
      O Roeschen Roth! Der Mensch liegt in tiefster Noth! Der Mensch
    liegt in
      tiefster Pein! Je lieber moecht' ich im Himmel sein! ---
    _Urlicht_
             nickn@unimelb.edu.au http://www.opoudjis.net
    Dr Nick NICHOLAS, French Italian & Spanish, Univ. Melbourne, Australia



    This archive was generated by hypermail 2.1.5 : Tue May 24 2005 - 02:57:58 CDT