Re: UTF-8, C1 controls, and UNIX

From: John Cowan (jcowan@reutershealth.com)
Date: Thu Mar 01 2001 - 15:07:32 EST


Frank da Cruz wrote:

> My point is that UTF-8 is not really up to the task it was designed for,
> i.e. transparent usability with hosts that are ignorant of it.

It is transparent as a file format, not as a wire format necessarily.
ASCII isn't transparent as a wire format: if you transmit control
characters over the wire, funny things may happen.

> In fact it
> was designed only for UNIX (Plan 9), which is why "/" is sacrosanct, and why
> it contains no NULs (because of C). The C1 problem was overlooked because
> nobody really considered it. And non-UNIX platforms use lots of characters
> besides "/" in pathname syntax, so even leaving aside the C1 issue, we'd
> need another UTF for VMS, another for VOS, another DOS and Windows, and so
> on.

There is nothing special about / in UTF-8. ASCII characters (0x00 to 0x7F)
are what is sacrosanct in UTF-8.

-- 
There is / one art             || John Cowan <jcowan@reutershealth.com>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT