Re: UTF-8, C1 controls, and UNIX

From: Frank da Cruz (
Date: Thu Mar 01 2001 - 14:25:00 EST

> Just to be sure: ISO 2022 has two modes, 7 bits and 8 bits, hasn't it?
> And in 7 bit mode (I know it's obsolescent), then C1 controls are not
> supposed to be interpreted as controls, are they?
Nor as graphics.

> > In VMS, which fully supports C1 controls from VT220-and-above terminals,
> > and is completely 8-bit clean and ISO standards-compliant, the sequence:
> >
> > 0x9B41
> >
> > is CSI (Control Sequence Introducer) followed by A, which happens to be
> > what the VT220 Up-Arrow key sends.
> AFAIR, 0x1B5B41 does just the same (but I agree it take a bit more
> of bandwidth... barely noticeable these days).
VMS is an example of a platform that really, really takes advantage of
ISO standards. When you log in to VMS, it sends an escape sequence to
query the terminal. If the response indicates C1 capability, the host
sends an escape sequence commanding the terminal into C1 mode. This is
how it has worked for nearly 20 years.

> > Suppose I have a UTF8 terminal and I type Cyrillic uppercase letter
> > EL, U+041B. In UTF8, this is 0xD09B.
> With ISO 2022 and co (and I mean 2375 here), you are supposed to send
> 0x1B2547
> before actually sending any byte in UTF-8 encoding, aren't you?
> Did your terminal did that?
> If it does, why is the host still eating any C1 control character?
I began this discussion by saying that UTF-8 is supposed to be usable with
hosts that do not understand Unicode, and therefore do not understand UTF-8.
So sending an announcer would not do any good. Anyway, none of these hosts
have any code in them whatsoever for interpreting C1 bytes as graphic
characters unless (a) the writers of the terminal drivers were ignorant of
the applicable standards, and/or (b) standards compliance was deliberately
removed to adjust to the new preponderance of Microsoft code pages on the

My point is that UTF-8 is not really up to the task it was designed for,
i.e. transparent usability with hosts that are ignorant of it. In fact it
was designed only for UNIX (Plan 9), which is why "/" is sacrosanct, and why
it contains no NULs (because of C). The C1 problem was overlooked because
nobody really considered it. And non-UNIX platforms use lots of characters
besides "/" in pathname syntax, so even leaving aside the C1 issue, we'd
need another UTF for VMS, another for VOS, another DOS and Windows, and so

- Frank

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT