Re: UTF-8, C1 controls, and UNIX

From: Frank da Cruz (fdc@columbia.edu)
Date: Thu Mar 01 2001 - 15:15:27 EST


> I don't understand this part of your rhetoric here. In UTF-8, *ASCII* is
> sacrosanct, not just "/".
>
Right, sorry. I withdraw my point about VMS and other pathnames.

> And as for your overall point, I don't know of any claim that UTF-8
> was designed for "transparent usability with hosts that are ignorant of
> it." The documentation at the time claims the following criteria:
> ...
> And yes, anybody who participated at the time was perfectly aware
> that you couldn't just pump UTF-8 at a terminal or host that was
> interpreting C1 control values and expect nothing odd to happen.
>
OK, I believe you. But it's a disappointment, since with that additional
step, it could have been used transparently with Unicode-ignorant hosts
(with some caveats, e.g. about video editing, pattern-matching, etc).

We discussed the C1 problem before, but only in the host-to-terminal
context, and we all agreed it's manageable -- more or less -- but the
inability to type UTF-8 at an 8-bit clean host on an 8-bit clean connection
is a show-stopper.

In fact, you *can* type quite a bit of UTF-8 -- all the characters whose
UTF-8 representation does not include C1's. So in the case of Cyrillic,
you can type ER, ES, TE, ... but you can't type A, BE, VE, GHE, ...

I bring this up only to make it clear to everyone that there is a problem
that might not have been obvious. If I put UTF-8 files on a host, and I
have a UTF-8 terminal, I can display them just fine. I can read UTF-8
email and netnews on the host, etc etc. But I can type only a subset of
graphic characters, not all of them.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT