Re: New 8 Bit Character Sets

From: Alain LaBont/e'/ (alb@sct.gouv.qc.ca)
Date: Thu Aug 29 1996 - 09:28:56 EDT


At 03:02 29/08/1996 -0400, Jonathan Rosenne wrote:
>Ed Hart wrote:
>>With major workstation applications that support Unicode (ISO/IEC
>>10646-1:1993, UCS-2) due next year, I would question the value of
>>standardizing another part of ISO/IEC 8859 and your ability to get
>>manufacturers to support it before 1998. By that time, you will see
>>Unicode/10646 support available in more and more products.

[Jonathan]:
>I agree. Even if manufacturers do, will users want to convert twice, once to
>an interim improved 8859 and then to 10646? These conversions can be quite
>painful.

[Alain]:
I have gone through this 5 or 6 times (maybe more) since 1971. Impact is in
general exaggerated. In particular when existing alhabetic characters don't
change. From IBM 437 to 850, all existing alphabetics were the same. From
437 to 863, a few alpha changed (10 upper case accented letters), and from
863 to 850, same scheme (I once converted a 200-page French document from
863 to 850: I had to change only 5 characters, occuring in many instances,
but a search/replace operation was done in less than 5 minutes; for data
files, these characters are even less frequent). In EBCDIC mode, same
scenario (Canadian bilingual to IBM 037, eventually to IBM 500).

The most painful was from IBM 850 to Latin 1 (Windows): everything changed
overnight, without warning; Microsoft provided a one-way conversion tool; I
still have to convert back and forth (I think I am not the only one on earth
(-: ) so I made my own tools for the whole back-and-forth process (even to
right the bad conversion made when I meet a bad configutration that I am not
able to right under Windows).

But here we are talking about a code in which all letters would stay where
they are and almost-unused characters (stand-alone spacing accents, in
particular, useless) would be replaced by the ones we need. What is the
problem to users? Most would not even remark the change. What is the problem
to manufacturers? It is more psychological to developers than anything else;
compared to the number of bugs that have to be fixed in normal software
maintenance, that is nothing. And that satisfies the users, so it is good
for PR, it will enhance the life of bored salespersons (-:

[Jonathan]:
>I propose instead a new UTF scheme, which I will call UTF-256:
>
>The idea is that each message first defines a mapping from codes 0 to 255 into
>UCS, then proceeds to use 8 bit codes for the content.
>
>The header, which defines the mapping, will be in UTF-7. If you do not want
>to use C1, just don't define any mapping from that zone.

[Alain]:
That is just the problem. If we use the current Windows characters that we
need, they are in C1. We need them on other platforms too, in particular on
EBCDIC-based platforms. At least the data must be preserved back-and-forth.

Alain LaBonti
Quibec



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT