Re: UTF-7,5

From: Markus G. Kuhn (kuhn@cs.purdue.edu)
Date: Mon Jul 14 1997 - 16:08:42 EDT

Next message: Tony Harminc: "Re: isLetter on Katakana marks"
Previous message: Martin J. Duerst: "Re: bidi in Taiwanese Mandarin (was: bidi support)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"Unicode Discussion" wrote on 1997-07-14 08:45 UTC:
> UTF-8 may travel well through the transport protocol, but due to the
> presence of C1 control characters it can cause errors on displaying and/or
> printing the received message.

I absolutely do not care, and you shouldn't either. C1 characters will
only cause problems on those receiving systems that do not understand
UTF-8 at all. And on those systems, *ALL* non-ASCII characters are messed
up anyway, so some additional mess-up due to C1 characters doesn't
make things any worse. Broken is broken. I do not think it is good
practive to support a concept of slightly-less-broken-than-totally-broken
as you would do it in your UTF-7,5 Latin-1 backwards compatibility.

> That's the point of inventing UTF-7,5 which
> has the nice additional property of being >>almost human readable<< for
> latin-1 letters.

Hm, back to the fundamentals ...

Standardization is the elimination of unnecessary diversity in technical
specifications, and not the encouragment of technical diversity by
standardizing lots of different new alternative hacks.

Unfortunately, too few people involved in standardization have
understood, what that *real* purpose of standardization is. This is
the reason, why we see so many narrow-minded proposals that are
not upwards-compatible with the envisioned long-term solution (Latin-0 for
the people who think they might need the Euro-symbol, UTF-7,5 for
people how think they might need a Unicode-encoding backwards compatible
to their ISO 2022/Latin-1 terminal, etc.).

UTF-7,5 is a nice hack, but I feel that introducing it in the large
or even standardizing it will do more harm than good. I have also
seen several other altenative proposals to UTF-8, which put all bytes
sequences for Unicode characters into (!) the C1 range for full
Latin-1 compatibility. They all came from Germany, too.
As narrow minded. :-)

By the way, with proper rounding it should be called UTF-7,6 as
log_2(191) is 7.5774288280357486893, and in the 192 characters
mentioned on <http://vzdmzi.zdv.uni-mainz.de/~knappen/jk009.html>,
you probably counted 0x7f = DEL as a graphic character.

Markus

-- 
Markus G. Kuhn, Computer Science grad student, Purdue
University, Indiana, USA -- email: kuhn@cs.purdue.edu

Next message: Tony Harminc: "Re: isLetter on Katakana marks"
Previous message: Martin J. Duerst: "Re: bidi in Taiwanese Mandarin (was: bidi support)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT