Re: UTF-8, C1 controls, and UNIX

From: Frank da Cruz (
Date: Thu Mar 01 2001 - 11:44:31 EST

> On Wed, 28 Feb 2001, Frank da Cruz wrote:
> [...]
> > Cyrillic letters (e.g. capital A through PE). Most UNIX terminal drivers
> > treat incoming C1 controls like their C0 counterparts, so 0x83 == 0x03 ==
> > Ctrl-C, which interrupts whatever process you are talking to. Similarly
> > 0x84 == Ctrl-D, which is EOF; 0x88 is backspace, and so on.
> Do these terminal drivers not support cs8 and -istrip? I don't have any
> trouble typing these characters in Linux 2.2.18, for instance.
It doesn't matter, does it? If the host does not expect UTF-8, the C1
controls will either be treated as C0 controls or else as C1 controls,
but not as text unless the terminal driver has been programmed to violate
ISO 4873 and ISO 2022, not to mention ISO 6429. I don't doubt this might
be the case with Linux.

In VMS, which fully supports C1 controls from VT220-and-above terminals,
and is completely 8-bit clean and ISO standards-compliant, the sequence:


is CSI (Control Sequence Introducer) followed by A, which happens to be
what the VT220 Up-Arrow key sends.

Suppose I have a UTF8 terminal and I type Cyrillic uppercase letter
EL, U+041B. In UTF8, this is 0xD09B. The host sees this as a printable
8-bit character (0xD0) followed by CSI (0x9B). If the next character I
happen to type is ASCII letter A, the host thinks I pressed the Up-arrow
key. In any case, the host terminal driver is stuck in its
control-sequence state machine, waiting for a terminating character,
as defined in ANSI X3.64.

> PS. Thanks for Kermit.
You're welcome :-)

- Frank

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT