Shlomi Tal wrote:
> If you think 7-bit issues are totally obsolete, then sorry for bothering...
Personally, I think they are, but I do find encoding schemes entertaining :-)
> UTF-7 is both stateful and fragile. Stateful it has to be, because any
Fragile. You assume lossy transport instead of trusting the error correction of the lower layers.
> attemp to encode a large charset AND maintain compatibility to ASCII has
> to be stateful.
... if you also care to stay within 7 bits.
> However, it is also fragile in that there is no
> self-sync or seek coherence (that's the advantage of UTF-8, as we all
> Borrowing from the idea of ISO-2022-JP extended into EUC, but the other
> way round, I had the following "Gedankenexperiment":
> 00..A0 stay the same
> FF not used
> C2..FE leadbytes (1 leadbyte)
> A1..C1 trailbytes (2 trailbytes)
> allowing 61 x 33 x 33 codepoints - a little more than 65536.
What about the other 1M code points? Would this encode UTF-16 code units?
> And now, with an ISO-2022 sequence for state, reduce to 7-bit:
You seem to imply to just switch between "lower bytes" (00..7f) and "upper bytes" (80..ff), which you can do with just SI/SO without the rest of the ISO 2022 apparatus.
> 42..7E leadbytes (1 leadbyte)
> 21..41 trailbytes (2 trailbytes)
What about 80..9f which would collide with C0 control codes?
What about U+00a0 which would become 20 (space) which might be removed/replaced by emailers in ways that you would not expect for U+00a0?
What about users' complaint of the high byte-per-code point ratio in Unicode encodings?
For everything but ASCII (U+0000..U+007f), UTF-7 uses 2.67 B/cp, while this uses 3 B/cp.
> Stateful, yes... fragile, no! Any relevance, or is this just an amusing
> experiment to be kept among geeks privately?
Time will tell. You could ask Doug to add it to his collection :-)
This archive was generated by hypermail 2.1.2 : Tue Jun 18 2002 - 10:30:47 EDT