Re: UTF-8 and Kermit

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Tue Jul 15 1997 - 16:15:36 EDT


> Frank da Cruz wrote on 1997-07-15 18:20 UTC:
> > > I know of very few tools who do with C1 characters anything else than
> > > ignoring them or replacing them with hexadecimal replacements...
> > >
> > As noted repeatedly, these codes are used by VT220, VT320, and above.
> > Some of them are part of ISO 2022: SS3, etc. They are widely and properly
> > used in the real world.
>
> Can Kermit already be switched to handle UTF-8?
>
Nope, not yet. So far all we've been able to do with Unicode is (a) use it
internally as the intermediate charset for translating between other charsets,
e.g. NeXT and Data General (which both have OE, so that way we don't lose it
the way we used to when using 8859-x as the intermediate set), and (b) use it
as the native character set in the Windows NT version, so we can display an
incoming ISO 2022 data stream correctly (English, German, Turkish, Czech,
Russian, Greek, etc, all on the same screen).

> If a terminal emulator handles UTF-8, then the C1 characters will be
> interpreted AFTER the UTF-8 decoding has taken place.
>
I guess. The whole issue of Unicode as an on-the-wire character set, and
its many possible encodings, especially in terminal emulation, is going to
be an interesting one for some time to come. I don't know what else to say.

The day we have to deal with it is the day that you Telnet to (say) a UNIX
host and the herald and "login:" prompt come out in Unicode. Which is not the
same thing as logging in (as we do now) in ASCII or Latin-1, etc, and then
maybe trying to display a Unicode file. And I'm not sure how we'd handle that
anyway. If it was anything else, we could have a host-resident file viewer
that sent the proper ISO 2022 sequences before and after the file, but as far
as I know, there is nothing like that for Unicode / UTF-8 / etc, since these
do not have the ISO character-set structure.

I think this is an interesting application and might deserve some attention
from the list, so I'm copying the list on this. To restate the problem:

Suppose I have a terminal emulator that understands ISO 2022 and all sorts of
ISO-registered character sets (such as all the ISO 646s and 8859 1-10), and
which converts them to Unicode and displays them in a Unicode font -- and I do
have such an emulator:

  http://www.columbia.edu/kermit/kuishots.html#shot3

Now suppose I am logged in to a conventional UNIX host and I want to "cat" a
Unicode file that is either bare Unicode or (more interesting) encoded in
UTF-8 or other encoding. What escape sequence can be sent to the terminal to
switch it into and out of Unicode / UTF-8, so that all the regular 8-bit stuff
before and after the Unicode text appears correctly, and so does the Unicode
text?

Stated another way: is there a movement afoot to register Unicode, UTF-8,
etc, with ISO so that they get ISO 2022 escape sequences? (Even though they
might not fit into the ISO 4873 structure.) If not, should there be? If not,
then what would be a reasonable way to mix (say) UTF-8 in with a regular
ASCII or Latin-1 (etc) data stream?

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT