Re: Sporadic Unicode revisited

From: Keld Jørn Simonsen (keld@dkuug.dk)
Date: Thu Oct 03 2002 - 13:50:16 EDT

  • Next message: Stefan Persson: "Re: unallocated Unicode character and VB"

    On Thu, Oct 03, 2002 at 08:58:47AM -0700, Doug Ewell wrote:
    > Kenneth Whistler <kenw at sybase dot com> wrote:
    >
    > > Attempting to extend the system to Greek, Cyrillic, Hebrew, and Arabic
    > > just (in my opinion) results in mnemonics that are harder to remember
    > > than the character names, even. What is the real advantage of "s*",
    > > "s=", "S+" and "s+" over "sigma", "es", "samekh" and "seen" for
    > > occasional usage? You end up having to look up all those "mnemonics"
    > > in a table anyway, if you actually want to use them.
    >
    > I can see the advantage if you have extended text (not just an isolated
    > letter). "p=r=i=v=e=t=" or even "&p=&r=&i=&v=&e=&t=" is quite a bit
    > easier to read than a sequence of vocalized letter names.
    >
    > My problem with RFC 1345, one reason I never implemented a converter
    > even though it was a temptation, involves the escape character &.
    > U+0026, the real ampersand, is encoded as simply "&", but that conflicts
    > with its use as an escape character. So the sequence "B&O" (including
    > the double quotation marks) is ambiguous; it could mean
    >
    > U+0022 U+0042 U+0026 U+004F U+0022
    >
    > or
    >
    > U+0022 U+0042 U+0150

    Well, you double the introducer & to represent itself, so the second
    example is the correct interpretation.

    > Another problem is that the system is frozen in time in June 1992.
    > There is no provision to extend the repertoire of RFC 1345 symbols to
    > match the growing repertoire of Unicode. Even U+20AC EURO SIGN cannot
    > be represented! At the same time, though, there are several
    > "additional" symbols, mapped to the Private Use Area (U+E000 through
    > U+E028), for characters assigned in ISO 6937 and other standards, some
    > of which were subsequently added to Unicode or were already there (e.g.
    > "DUTCH GUILDER SIGN," a.k.a. U+0192 LATIN SMALL LETTER F WITH HOOK).

    The system, but not the RFC, has been extended, eg by ISO/IEC TR 14652 .
    You can always use Uxxxx or Uxxxxxxxx identifiers for 10646 chars.

    Best regards
    keld



    This archive was generated by hypermail 2.1.5 : Thu Oct 03 2002 - 14:43:53 EDT