Re: Is there a UTF that allows ISO 8859-1?

From: Dan (
Date: Sun Aug 30 1998 - 06:12:56 EDT

Kenneth Whistler said:

> Many of the Unicode implementers on this list work for these
> "business empires", and we aren't exactly hiding from the public.
> What the consensus of opinion seems to be stating is that UTF-8 is
> here to stay, it is standard and becoming widespread in implementation,
> and that it isn't really that big a problem that it doesn't preserve
> Latin-1 byte values to be readily legible on Latin-1 terminals.
> As others have pointed out, UTF-8 is not that hard to auto-detect,

One important thing to remember is that there is both a need to have a
standard way to transport text between places and a need to handle
text at a place.

For transport UTF-8, UCS-2 or UCS-4 is fine.

But not always on a host!

Many sites have large amounts of data that is stored in an other way than
UTF-8, UCS-2 or UCS-4. They still have to be able to handle all old data
and as software cannot change over night, new software must be able to write
in the format old software understands.
For example: Sun has a new locale for Swedish using UTF-8. This locale
is totally unusable for me. If I activate it, programs understanding the new
locale will choke on all my ISO 8859-1 based text and all my file names.
And new file names and text chreated will be unreadable by all old
software. If I cannot read my old files, the software is unusable.

If software shall handle UCS, it need to handle the text as UCS internally,
the format used for reading and writing is only for storage or transport.
Sounds fine with self-synchronization and using fgrep on text files?
Does fregp -i work? You must be able to hande case insensitivity - can
you do that by treating the text as a byte stream without internally
having it as a UCS stream? And of course as some have pointed out,
decomposed characters are a problem. Also, most text is line oriented,
so it is very easy do have self-synchronization on lines.

So, yes UTF-8 is fine for transport. But is is not the solution for
local storage today. In 10 years maybe, if software vendors immediately
change their software so if can both read and write the local format and
UTF-8. After 10 years we might be ready, if all software is then replaced
by new software, to start writing local files in UTF-8.
(Well, those who use ASCII can start already).


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT