Re: MCW encoding of Hebrew (was RE: Response to Everson Ph and why Jun 7? fervor)

From: John Hudson (tiro@tiro.com)
Date: Mon May 24 2004 - 18:46:29 CDT

  • Next message: E. Keown: "Re: Response to Everson Ph and why Jun 7? fervor"

    Peter Constable wrote:

    > I was not involved in those discussions so cannot comment on them. I
    > just wish to point out that the MCW representation of Hebrew most
    > certain *is* supported in Unicode: MCW uses ASCII Latin letters and
    > punctuation characters to stand for Hebrew letters, vowel points and
    > accents, and those exact same ASCII characters are encoded in Unicode.

    This was an 8-bit hack, the point which Elaine and other Biblical Hebrew scholars make is
    that MCW explicitly encodes distinctions between some marks, based on positioning, that
    the Unicode Hebrew block unifies. This means that while MCW text can be easily converted
    to Unicode Hebrew, it is not possible to round-trip such conversion in the same way that
    Unicode provides for pre-existing 8-bit standard character sets. One of the unfortunate
    aspects of this is that the ASCII-hack MCW encoding will likely remain the source encoding
    for many electronic Biblical Hebrew texts for some time to come, even if published texts
    are re-encoded as Unicode Hebrew, since MCW permits simple and unambiguous plain-text
    encoding of distinctions that are important to textual analysis. For example, although my
    clients at Libronic use Unicode encoding for their electronic BHS edition (because it
    provides greater interchangeability), they maintain an MCW encoded text as their master
    source. So much for the 'universal' character set...

    John Hudson

    -- 
    Tiro Typeworks        www.tiro.com
    Vancouver, BC        tiro@tiro.com
    Currently reading:
    Typespaces, by Peter Burnhill
    White Mughals, by William Dalrymple
    Hebrew manuscripts of the Middle Ages, by Colette Sirat
    


    This archive was generated by hypermail 2.1.5 : Mon May 24 2004 - 18:47:11 CDT