Re: New 8 Bit Character Sets

From: Mark Leisher (
Date: Thu Aug 29 1996 - 16:40:59 EDT

    Jonathan> Ed Hart wrote:
>> With major workstation applications that support Unicode (ISO/IEC
>> 10646-1:1993, UCS-2) due next year, I would question the value of
>> standardizing another part of ISO/IEC 8859 and your ability to get
>> manufacturers to support it before 1998. By that time, you will see
>> Unicode/10646 support available in more and more products.

    Jonathan> I agree. Even if manufacturers do, will users want to convert
    Jonathan> twice, once to an interim improved 8859 and then to 10646? These
    Jonathan> conversions can be quite painful.

    Jonathan> I propose instead a new UTF scheme, which I will call UTF-256:

    Jonathan> The idea is that each message first defines a mapping from codes
    Jonathan> 0 to 255 into UCS, then proceeds to use 8 bit codes for the
    Jonathan> content.

    Jonathan> The header, which defines the mapping, will be in UTF-7. If you
    Jonathan> do not want to use C1, just don't define any mapping from that
    Jonathan> zone.

I think I missed part of this conversation, but I'll toss in something

I don't really understand why work should continue on character sets of size
256. It must be psychological. 16-bit character sets can be efficiently
implemented on platforms with as little as 8MB of memory and maybe even 4MB.
I have tested our Unicode implementation on an 8MB Linux machine with very
reasonable results with respect to memory usage and performance.

In addition, more work on character sets of size 256 means more conversion
tables when software products begin turning to 10646/Unicode.

If I missed the point, then please ignore my comments.
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
Las Cruces, NM 88003

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT