> My point is that UTF-8 is not really up to the task it was designed for,
> i.e. transparent usability with hosts that are ignorant of it. In fact it
> was designed only for UNIX (Plan 9), which is why "/" is sacrosanct, and why
> it contains no NULs (because of C).
I don't understand this part of your rhetoric here. In UTF-8, *ASCII* is
sacrosanct, not just "/".
> The C1 problem was overlooked because
> nobody really considered it. And non-UNIX platforms use lots of characters
> besides "/" in pathname syntax, so even leaving aside the C1 issue, we'd
> need another UTF for VMS, another for VOS, another DOS and Windows, and so
Why? "\" and ":" are also "sacrosanct" in UTF-8. No pathname syntax that
I know of is disturbed by UTF-8. (Well, EBCDIC paths, I suppose, but then
ASCII itself would trash EBCDIC paths if you didn't convert.)
The early description of UTF-8 (FSS-UTF) focussed on "/" because its
predecessor, UTF-1, did not preserve "/". So UTF-8 was a fix for that.
And as for your overall point, I don't know of any claim that UTF-8
was designed for "transparent usability with hosts that are ignorant of
it." The documentation at the time claims the following criteria:
1. Compatibility with historical file systems. (met by ASCII preservation)
2. Compatibility with existing programs. (and by this is meant 8-bit
API usability as strings, as well as ASCII preservation)
3. Easy conversion from/to [16-bit] Unicode.
4. First byte indication of length of trailing byte sequence.
5. Non-extravagance in number of bytes needed for encoding.
6. Local resynching capability.
I think UTF-8 met all those criteria.
And yes, anybody who participated at the time was perfectly aware
that you couldn't just pump UTF-8 at a terminal or host that was
interpreting C1 control values and expect nothing odd to happen.
> - Frank
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT