Sync/Seek-robust UTF-7

From: Shlomi Tal (shlompi@hotmail.com)
Date: Tue Jun 18 2002 - 06:21:20 EDT


If you think 7-bit issues are totally obsolete, then sorry for bothering...

UTF-7 is both stateful and fragile. Stateful it has to be, because any
attemp to encode a large charset AND maintain compatibility to ASCII has to
be stateful. However, it is also fragile in that there is no self-sync or
seek coherence (that's the advantage of UTF-8, as we all know).

Borrowing from the idea of ISO-2022-JP extended into EUC, but the other way
round, I had the following "Gedankenexperiment":

00..A0 stay the same
FF not used
C2..FE leadbytes (1 leadbyte)
A1..C1 trailbytes (2 trailbytes)

allowing 61 x 33 x 33 codepoints - a little more than 65536.

And now, with an ISO-2022 sequence for state, reduce to 7-bit:

42..7E leadbytes (1 leadbyte)
21..41 trailbytes (2 trailbytes)

Stateful, yes... fragile, no! Any relevance, or is this just an amusing
experiment to be kept among geeks privately?

_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail.
http://www.hotmail.com



This archive was generated by hypermail 2.1.2 : Tue Jun 18 2002 - 04:54:40 EDT