RE: UTF-8, C1 controls, and UNIX

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Fri Mar 02 2001 - 04:11:51 EST


Antoine Leca wrote:
> > In VMS, which fully supports C1 controls from
> VT220-and-above terminals,
> > and is completely 8-bit clean and ISO standards-compliant,
> the sequence:
> >
> > 0x9B41
> >
> > is CSI (Control Sequence Introducer) followed by A, which
> happens to be
> > what the VT220 Up-Arrow key sends.
>
> AFAIR, 0x1B5B41 does just the same (but I agree it take a bit more
> of bandwidth... barely noticeable these days).

Perhaps you are missing Frank's point here.

The sequence of octets 0x9B 0x41 may be a legitimate part of a *graphic*
UTF-8 string, which was not intended to be interpreted as a control
sequence:

- 0x9B (binary 10011011) is the last octet of a multi-octet character (the
only thing we know about the Unicode character is that it ends with the hex
digit B);

- 0x41 is the single-octet character U+0041 ('A').

In your case, 0x1B 0x5B 0x41 ("<esc>[A") is an *intentional* control
sequence which probably has been inserted in the UTF-8 file exactly to
obtain that effect on the terminal.

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT