Re: Sporadic Unicode revisited

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Oct 03 2002 - 11:58:47 EDT

  • Next message: Magda Danish (Unicode): "unallocated Unicode character and VB"

    Kenneth Whistler <kenw at sybase dot com> wrote:

    > Attempting to extend the system to Greek, Cyrillic, Hebrew, and Arabic
    > just (in my opinion) results in mnemonics that are harder to remember
    > than the character names, even. What is the real advantage of "s*",
    > "s=", "S+" and "s+" over "sigma", "es", "samekh" and "seen" for
    > occasional usage? You end up having to look up all those "mnemonics"
    > in a table anyway, if you actually want to use them.

    I can see the advantage if you have extended text (not just an isolated
    letter). "p=r=i=v=e=t=" or even "&p=&r=&i=&v=&e=&t=" is quite a bit
    easier to read than a sequence of vocalized letter names.

    My problem with RFC 1345, one reason I never implemented a converter
    even though it was a temptation, involves the escape character &.
    U+0026, the real ampersand, is encoded as simply "&", but that conflicts
    with its use as an escape character. So the sequence "B&O" (including
    the double quotation marks) is ambiguous; it could mean

        U+0022 U+0042 U+0026 U+004F U+0022

    or

        U+0022 U+0042 U+0150

    Another problem is that the system is frozen in time in June 1992.
    There is no provision to extend the repertoire of RFC 1345 symbols to
    match the growing repertoire of Unicode. Even U+20AC EURO SIGN cannot
    be represented! At the same time, though, there are several
    "additional" symbols, mapped to the Private Use Area (U+E000 through
    U+E028), for characters assigned in ISO 6937 and other standards, some
    of which were subsequently added to Unicode or were already there (e.g.
    "DUTCH GUILDER SIGN," a.k.a. U+0192 LATIN SMALL LETTER F WITH HOOK).

    -Doug Ewell
     Fullerton, California



    This archive was generated by hypermail 2.1.5 : Thu Oct 03 2002 - 13:19:52 EDT