Re: UTF-8, C1 controls, and UNIX

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Thu Mar 01 2001 - 13:47:19 EST


Frank da Cruz wrote:
>
> It doesn't matter, does it? If the host does not expect UTF-8, the C1
> controls will either be treated as C0 controls or else as C1 controls,
> but not as text unless the terminal driver has been programmed to violate
> ISO 4873 and ISO 2022, not to mention ISO 6429.

Just to be sure: ISO 2022 has two modes, 7 bits and 8 bits, hasn't it?
And in 7 bit mode (I know it's obsolescent), then C1 controls are not
supposed to be interpreted as controls, are they?

> In VMS, which fully supports C1 controls from VT220-and-above terminals,
> and is completely 8-bit clean and ISO standards-compliant, the sequence:
>
> 0x9B41
>
> is CSI (Control Sequence Introducer) followed by A, which happens to be
> what the VT220 Up-Arrow key sends.

AFAIR, 0x1B5B41 does just the same (but I agree it take a bit more
of bandwidth... barely noticeable these days).

> Suppose I have a UTF8 terminal and I type Cyrillic uppercase letter
> EL, U+041B. In UTF8, this is 0xD09B.

With ISO 2022 and co (and I mean 2375 here), you are supposed to send
  0x1B2547
before actually sending any byte in UTF-8 encoding, aren't you?
Did your terminal did that?
If it does, why is the host still eating any C1 control character?

Or do I miss something important somewhere?

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT