Re: Sporadic Unicode revisited

From: Keld Jørn Simonsen (keld@dkuug.dk)
Date: Thu Oct 03 2002 - 13:50:16 EDT

Next message: Stefan Persson: "Re: unallocated Unicode character and VB"

Previous message: Michael \(michka\) Kaplan: "Re: unallocated Unicode character and VB"
In reply to: Doug Ewell: "Re: Sporadic Unicode revisited"
Next in thread: Doug Ewell: "Re: Sporadic Unicode revisited"
Reply: Doug Ewell: "Re: Sporadic Unicode revisited"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Thu, Oct 03, 2002 at 08:58:47AM -0700, Doug Ewell wrote:
> Kenneth Whistler <kenw at sybase dot com> wrote:
>
> > Attempting to extend the system to Greek, Cyrillic, Hebrew, and Arabic
> > just (in my opinion) results in mnemonics that are harder to remember
> > than the character names, even. What is the real advantage of "s*",
> > "s=", "S+" and "s+" over "sigma", "es", "samekh" and "seen" for
> > occasional usage? You end up having to look up all those "mnemonics"
> > in a table anyway, if you actually want to use them.
>
> I can see the advantage if you have extended text (not just an isolated
> letter). "p=r=i=v=e=t=" or even "&p=&r=&i=&v=&e=&t=" is quite a bit
> easier to read than a sequence of vocalized letter names.
>
> My problem with RFC 1345, one reason I never implemented a converter
> even though it was a temptation, involves the escape character &.
> U+0026, the real ampersand, is encoded as simply "&", but that conflicts
> with its use as an escape character. So the sequence "B&O" (including
> the double quotation marks) is ambiguous; it could mean
>
> U+0022 U+0042 U+0026 U+004F U+0022
>
> or
>
> U+0022 U+0042 U+0150

Well, you double the introducer & to represent itself, so the second
example is the correct interpretation.

> Another problem is that the system is frozen in time in June 1992.
> There is no provision to extend the repertoire of RFC 1345 symbols to
> match the growing repertoire of Unicode. Even U+20AC EURO SIGN cannot
> be represented! At the same time, though, there are several
> "additional" symbols, mapped to the Private Use Area (U+E000 through
> U+E028), for characters assigned in ISO 6937 and other standards, some
> of which were subsequently added to Unicode or were already there (e.g.
> "DUTCH GUILDER SIGN," a.k.a. U+0192 LATIN SMALL LETTER F WITH HOOK).

The system, but not the RFC, has been extended, eg by ISO/IEC TR 14652 .
You can always use Uxxxx or Uxxxxxxxx identifiers for 10646 chars.

Best regards
keld

Next message: Stefan Persson: "Re: unallocated Unicode character and VB"
Previous message: Michael \(michka\) Kaplan: "Re: unallocated Unicode character and VB"
In reply to: Doug Ewell: "Re: Sporadic Unicode revisited"
Next in thread: Doug Ewell: "Re: Sporadic Unicode revisited"
Reply: Doug Ewell: "Re: Sporadic Unicode revisited"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Oct 03 2002 - 14:43:53 EDT