Re: MCW encoding of Hebrew (was RE: Response to Everson Ph and why Jun 7? fervor)

From: Philippe Verdy (
Date: Mon May 24 2004 - 17:28:10 CDT

  • Next message: Michael Everson: "Classification; Phoenician"

    Peter Constable wrote:
    > E. Keown wrote:
    > > Leading computational Hebraists in the late 1980s tried to
    > > persuade Unicode planners to include a non-public but very
    > > widely used academic Biblical Hebrew code, Michigan-
    > > Claremont-Westminster, in Unicode....They were rebuffed
    > > (or, if you will, perceived themselves to be rebuffed).
    > I was not involved in those discussions so cannot comment on
    > them. I just wish to point out that the MCW representation of
    > Hebrew most certain *is* supported in Unicode: MCW uses ASCII
    > Latin letters and punctuation characters to stand for Hebrew
    > letters, vowel points and accents, and those exact same ASCII
    > characters are encoded in Unicode. In fact, any existing
    > MCW/ASCII-encoded file of Hebrew text is, in fact, also
    > MCW/Unicode-encoded since the representation of Basic Latin
    > characters at the character encoding form and character
    > encoding scheme levels is exactly the same for ASCII as it is
    > for Unicode:
    > Hebrew MCS/ASCII MCS/Unicode
    > literal code unit literal UTF-8
    > ------------------------------------------------
    > alef ) 0x29 ) 0x29
    > bet B 0x42 B 0x42
    > gimel G 0x47 G 0x47
    > ...
    > To encode any different from this in Unicode to support MCW
    > texts would have been fairly bad news for the people that use
    > it.

    Is it a joke? UTF-8 designates Unicode codepoints refering to
    Unicode abstract characters with all their semantic (including
    the character name and properties).
    This table looks like a tweak. Or it is not correctly explained here:
    what is MCS and MCW above?
    You can't say that the tableabove is ASCII not either Unicode.
    It's only a separate legacy 7-bit encoding.. which is probably
    not widely interoperable because unimplemented or not documented
    in the same common places as where ASCII and Unicode are defined.

    This archive was generated by hypermail 2.1.5 : Mon May 24 2004 - 17:28:37 CDT