Re: New contribution

From: C J Fynn (cfynn@gmx.net)
Date: Thu Apr 29 2004 - 05:18:45 EDT

  • Next message: D. Starner: "Re: New contribution"

    Dean Snyder <dean.snyder@jhu.edu> wrote:

    > 1) You find an Iron Age text in Israel that exhibits characteristics of
    > both Phoenician and Aramaic orthography.

    > 2) The text shows possible Hebrew and Phoenician linguistic features, so
    > you are not sure at all what language it represents.

    > 3) How will you encode it, given you have at your disposal Hebrew,
    > Phoenician, and Aramaic encodings?

    If the scripts are as structurally near identical as it is claimed they are
    then it should be straightforward to create a simlpe utility to transpose
    between Hebrew, Phonecian, and Aramaic block encodings and/or a "smart" font
    which can be used to display characters from one of these scripts with glyphs
    from another.

    (I notice Apple have a "Translitteration" feature for AAT fonts
    http://developer.apple.com/fonts/Registry/#Type23 to switch display of text
    between Hanja / Hangul, Hiragana / Katakana, Kana / Romanization,
    Romanization / Hiragana, Romanization / Katakana. A feature like this could
    always be extended to allow users to toggle between Phonecian/Hebrew display.)

    It always going to be harder to disunify data at a later date than to unify it
    since with plain, un-tagged text there is no indication of which script the
    original text was written in, unless it is encoded with a seperate sub-set of
    Unicode characters.

    > 4) How will your possible miss-encoding affect future software results?

    Why would this be a "miss-encoding"? I'd look at text encoded using
    characters for particular scripts as being "finer grained" than where text of
    different scripts is encoded using the characters for a single script. You can
    always go from high resolution to low resolution but not the other way round.

    > As the situation stands right now, one simply encodes it in Hebrew or
    > Latin transliteration, effectively deferring further analysis to other
    > processes. This has its benefits.

    Having Phonecian characters in Unicode does not prevent anyone continuing to
    use Hebrew or Latin translitteration but it does provide the option of using
    Phonecian.

    There is a somewhat similar dilemma with encoding Pali texts (e.g. the
    Theravada Buddhist Cannon) - the same Pali Cannon is written in Sinhalese,
    Burmese, Devanagari, Thai, Latin translitteration and several other scripts.

    Sanskrit manuscripts can also be found in most of the Indic scripts - though
    since Sanskrit is now predominantly written in Devanagari many choose to encode
    Sanskrit texts in that script (or Latin translitteration) no matter what script
    the manuscript of the text uses.

    - Chris



    This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 06:26:53 EDT