Re: UTF-8, C1 controls, and UNIX

From: Keld Jørn Simonsen (
Date: Wed Feb 28 2001 - 17:51:30 EST

On Wed, Feb 28, 2001 at 01:11:20PM -0800, Frank da Cruz wrote:
> The idea behind UTF-8 is to be able to use it in non-Unicode-aware UNIX
> versions: It lets you have Unicode filenames, Unicode directory names,
> Unicode file contents, Unicode email, etc. But what it does not do is let
> you *type* Unicode into regular UNIX applications or shells, if the UTF-8
> happens to contain C1 control characters as do, for example, many of the
> Cyrillic letters (e.g. capital A through PE). Most UNIX terminal drivers
> treat incoming C1 controls like their C0 counterparts, so 0x83 == 0x03 ==
> Ctrl-C, which interrupts whatever process you are talking to. Similarly
> 0x84 == Ctrl-D, which is EOF; 0x88 is backspace, and so on.

Maybe one should make a transmission safe UTF that left C1 alone?


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT