Re: Is there a UTF that allows ISO 8859-1 (latin-1)?

From: Gunther Schadow (gunther@aurora.rg.iupui.edu)
Date: Wed Aug 19 1998 - 12:33:49 EDT


> I would like to call escape encoding using printable characters
> insane too.

I have to agree in part. If the escape character is printable, the
encoding is not really backwards compatible, because the escape
character must itself be escaped, or we have to intorduce heuristics
which I consider insane. So, we can't use the tilde.

I would like to use a 7bit escape character, that's why I don't think
0xb8 () instead of the tilde is an option. The reason why I want a
7bit character is to be general enough. I want one encoding that works
in all of the three environments the same way: old 7bit ASCII, 8bit
ISO Latin-1, and Unicode aware software. My worries for 7bit ASCII are
not so much e-mail routers but that quater-century old legacy software
running on VAX/VMS, MVS, MUMPS and those kind of environments. This is
the software that often is freezed now (if it survives the Y2K havoc).

Using two encodings depending on the availability of the 8th bit is
not a solution, because that requires people to know in advance
whether to use one or the other. This argument might be not very
strong since in any case they have to know in advance whether to
encode the ISO Latin block or can send it unencoded.

As a conclusion, if a printable escape character is required, the
simple extending of UTF-7 to use the 8th bit seems not so bad after
all.

After contemplating the 0th code block (U+0000 -- U+007F) I wonder if
we can use U+007F (DEL). What happens to text containing DEL? How do
terminals render the DEL? What do old communication protocols do with
DEL? I tried it on VMS and UNIX displayed by xterm and some ANSI or
DEC compatible terminal. Interestingly there is no glyph shown for DEL
on xterm (which is not bad) and there is a dingbats thing shown on my
other terminal. So, what do you think?

We seem to have agreement that a Unicode encoding is required that is
truly backwards compatible so as to require no software changes as
long as the software does not want to use the extended power of
Unicode. Also old software should not interfere with the encoding.

> But as many think UTF-8 is the way to go and adding software that can
> read/write it, I ought to be easier to get people to use a slightly
> modified version of the read/write routines for UTF-8 (which allows all
> now produced latin-1 and UTF-8 texts to still be used) than getting them
> to support a totally new set of routines.

That requires software changes, which I want to avoid.

> >I have to blame UTF-7 for constricting itself to the ancient 7-bit
> >requirement.
> I agree. I protested when it was produced and said that it worked fine
> for latin-1 also without any escaping of latin-1 (except for +).

What did they answer you?

> > I am a German living in the U.S., and I know what I am
> >talking about when I say that most of the Internet's mail routers are
> >now 8-bit clean. ESMTP is available for years now (a decade?) and its
> >implemented in sendmail for years. Actually you have to switch ESMTP
> >off forcefully in sendmail if you don't want it, isn't it true? So why
> >do we hear this constant whining about Internet e-mail not being 8-bit
> >clean? Tell your MIME MTA to use transfer-encoding 8bit and try
> >it. Unless you don't live behind such insane CC-mail routers, you will
> >be pretty happy with it!
>
> And many think quoted-printable is the encoding to use i e-mail instead
> of 8-bit mail. Still, simple mail programs like the terminal based
> Unix mailx, mail and what the now are called, cannot interpret
> quoted-printable making, in my case Swedish text, very difficult to read.
> Also, as mail is stored in quoted-printable instead of decoded into
> native 8-bit, all texteditors, text displayers show the text encoded
> making it a hard time to read.

Yes I hate quoted printable for that. I would use quoted printable
only if I do not know anything about the distribution. I.e. when
posting to a mailing list. However, all my mailing lists are english
and thus I effectively never use quoted prinatble. If I want to be
really safe I use base64.

> Yes, it would be nice. Much better than today where to many people think
> the only proper way to handle text is to use ascii (7-bits) and encode
> everything using difficult to read encodings, and never bother to get
> all software to never display the encoded form to human beings.

so do we have a deal? collaboration? a proposal at the horizon?

regards
-Gunther



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT