Re: Caring about European requirements sensitively!

From: Alain LaBont\i - 2 (alb@riq.qc.ca)
Date: Wed Oct 22 1997 - 08:27:40 EDT


A 20:16 22/10/97 -0700, Kenneth Whistler a écrit :
>Alain a écrit :
>>
>> [Alain] :
>> In practice, IBM will create a new EBCDIC code page to have the EURO. It is
>> likely to be using the same code position as the one that will be
>> "replaced" in Latin 0 (likely at the end the CURRENCY SYMBOL) out of Latin
>> 1. It means that if this occur, mappings will still be valid. In the same
>> way when Latin 0 will be standardized this new code table will also contain
>> the other European characters missing as a defect in Latin 1 (which was
>> supposed to support French and Finnish fully but which did not do it). IBM
>> is likely to choose the same mapping positions that these characters will
>> have in Latin 0.
>
>I have never known IBM to be so cavalier in treatment of its code pages.
>We are not talking about *one* new code page here. Each of the Latin-1
>converged EBCDIC code pages (and there are quite a number of them besides
>CP037 and CP500) would have to spawn off another distinct code page
>if it is to be interconvertible with 8859-15 ("Latin-9"). 8859-15
>introduces a character repertoire which is distinct from the Latin-1
>repertoire ("character set" in IBM terminology). To work with the
>existing EBCDIC encodings (*plural*), that repertoire would have to
>be combined with new code pages to form a new family of CCSID's
>converged with 8859-15 instead of 8859-1. Those new CCSID's would,
>it is true, then be completely convertible with 8859-15, but they would
>create legacy problems for all the existing CCSID's that are Latin-1
>converged.
>
>And nothing about this is automatic. Defining a new CCSID, say CCSID
>35915 (for example) that has the 8859-15 repertoire, with the Code Page
>500 encoding, but with one-for-one replacements of positions of characters
>as 8859-15 replaces 8859-1 characters, doesn't do anything more than
>create a new CCSID. No Code Page 500 DB2 databases are going to suddenly
>upgrade themselves to use CCSID 35915. And none of the remapping is going
>to be transparent or problem-free.
>
>>
>> So compatibility will be clean there.
>
>I think it should be clear why I differ with you on that assessment.
>
>> And in practice,
>> EURO will have to be interchanged with EBCDIC, not only UNICODE (with
>> UNICODE too, of course). So what is being proposed in Latin 0 is clean to
>> do all this.
>
>Saying the EURO will have to be interchanged with EBCDIC doesn't make it
>happen. Do you really think that all those legacy databases are going
>to simply redefine their characters? IBM is going to tell its customers,
>oops, sorry, we didn't mean what our CDRA standard says when it defined
>all the characters you use in your databases, and we are going to change
>their meaning for you? I don't think so. The EURO will have to be introduced
>to EBCDIC by introducing new code pages, and then new code pages will
>have to be supported on the databases--and all of this will involve
>pain in the transition. Not clean at all.
>
>>
>> Now those UNIX-8-bit systems that want to implement Latin 0 will be happy
>> to be in the same bandwagon.
>
>If they want to implement Latin-9 as yet another alternative character
>set along with Latin-1, Latin-2, Latin-3, Latin-4, Latin-5, whatever, then
fine.
>But if they really think they can painlessly replace Latin-1 with
>Latin-9 by swapping in a few new characters for ones that nobody needed
anyway,
>then I suspect they will turn out not so happy after all.
>
>In my opinion, Latin-1 has been amazingly successful, and has become
>effectively the European "ASCII". Yes, it has defects, just as 7-bit
>ASCII has always had defects even for accentless representation of English
>data. But the campaign to deal with the Euro problem by sweeping away
>Latin-1 (and by the way then adding in some French and Finnish characters)
>has the potential to wreak IT havoc.
>
>The real consequence of continuing the replace Latin-1 with "Latin-0"
>campaign will be to destabilize the interpretation of Latin-1 data,
>and result in a de facto situation where everyone depends instead on
>Windows 1252 to get it right, at the expense of ISO-compliant 8859-x
>based Unix systems. (Is Latin-0 actually just another nefarious
>plot by Bill Gates, I wonder? ;-) )
>
>>
>> As for the additional few characters in 1252 that are left over (not many!)
>> in C1 control space, before we go to system-wide implementation of UNICODE,
>> the situation won't be worse than it is today. There exists so far no
>> requirement to exchange these characters with EBCDIC data, while there is a
>> European requirement to exchange the EURO SIGN, 3 French characters and 4
>> Finnish characters more than in Latin 1. When one wants to talk about
>> practical things, one has to talk practically. That's what the Latin 0
>> proposers have in mind, only practical considerations for the real world of
>> today and the 5 coming years at least.
>>
>> All the destroyers of Latin 0 just have *un*solutions to propose to the
>> requirements. They only want a quick fix that is not even a fix to the
>> problems exposed and they do not even want to see the problems and try to
>> solve them really. They do not care mich really about actual European
>> problems much, should I say if I did not know that they also have good
>> intentions in mind, of course.
>
>I see two valid non-*un*solutions:
>
> 1. Proceed with 8859-15 (Latin-9) and another part of 8859 (and
> add more parts of 8859 to create corresponding 8-bit standards
> that add the EURO SIGN to the Greek part, the Eastern European part,
> the Turkish part, the Baltic part, ... Deal honestly with the
> data convertibility problems between the new parts and the
> established set of 8859 parts which do not contain the EURO
> (or the French or Finnish characters), and expect to have a
> fairly long transition period of moving from EURO-less 8859
> parts to EURO-ful 8859 parts. A painful transition, but
> well-defined, stepwise, and not plagued with the potential
> problem of catastrophic loss of interpretability of Latin-1
> data. But this non-*un*solution doesn't have a clear path
> to the future. It is the short-term hack to solve the
> immediate problem of the EURO, but doesn't deal with the next new
> character that everyone in Europe decides that they must
> have in their IT systems.
>
> 2. Move the 8-bit systems to 8-bit+ systems, using UTF-8, with
> a drastically constrained repertoire. (Just union the European
> parts of 8859, if you like, and add U+20AC EURO SIGN.) You
> get a small repertoire compatible with European immediate
> needs, without any of the complications of the full 10646
> repertoire that everyone is so afraid of. The fonts will be
> easy to create--just piece them together from the chunks already
> required for the individual 8859-x encodings. Treat everything
> else (except for the actual encoding values) just as you have
> been for 8859-1. This is an incremental solution that has
> a future path. Future additions to the repertoire are simple
> extensions to the interpreted repertoire, without introducing
> coding changes or redefining characters. Fully Unicode compliant
> platforms such as Windows NT or Windows CE or AIX or systems that
> support UTF-8 already such as Solaris, could interwork with this
> trivially. Most systems that can handle Asian data could adopt a
> stripped-down UTF-8 repertoire like this in short order for Europe.
>
>Alain, I am afraid that you have been campaigning so long to eliminate
>7-bit constraints in favor of 8-bit clean data, that attaining the
>8-bit goal seems a valid place to stop for you. But reality is that
>an 8-bit character is not big enough to serve Europe's needs.
>The effort to elbow some characters out of one 8859 encoding in
>favor of a few different ones does not solve Europe's clear IT needs
>(which include Greece, the Czech Republic, Russia, ... --not just
>France and Finland).
>
>Try thinking of 8-bit characters as Unicode characters that someone
>unfortunately stripped the top 8-bits from, thereby trashing the
><oe>s in French and <z<>s in Finnish. Then we can all get on the
>bandwagon to eliminate this odious practice of stripping the high
>8 bits and get people to focus on implementing UTF-8 or UTF-16
>correctly.
>
>>
>> They do not respond to European requirements, whateber their goal is. In
>> doing so they are also threatening IBM and its huge installed base of
>> mainframes in Europe, I don't know if that is well realized.
>>
>> Or, what they say that should be done for EBCDIC will generate eternal
>> conversion costs (data losses and round-trip integrity violation) for which
>> they will be blamed for decades, if I might express it simply. Fortunately
>> common sense will prevail and Latin 0 will be standardized.
>
>Hopefully common sense will prevail and 8859-15 (Latin-9) will be seen
>as another bump in the long 8859 road towards Unicode/10646
>acceptance as the European solution for data representation.
>
>--Ken
>
>>
>> Alain LaBonté
>> Cornwall (Ontario)

[Alain] :
I personally have lived many conversions within EBCDIC environments, within
PC environments and finally to the Windows environment.

I can assure you that we won't have to convert the 8 characters we want to
replace (they are not alphabetic, nor syntactic).

I can assure you that if the EURO SIGN is required it is also required in
EBCDIC.

I can assure you that adding a styandard way to interchange the 3 French
and 4 Finnish additional characters with EBCDIC is not a hassle, but rather
a goodie, it is not a conversion problem, it is a conversion solution.

But you do not answer this EBCDIC problem.

That said we do every effort to switch to UNICODE but I can assure you that
EBCDIC systems will stay for a while and that we want to solve this problem
so taht essential data can be exchanged freely between UNICODE,
8-bit-Windows and EBCDIC, without forgetting Macs and other platforms.

EBCDIC systems have many code pages, but encouragement has been to reduce
their usage to IBM 037 and IBM 500 for the Latin-1 based envoironments. For
those, conversions will not be a problem, nothing compared to non-Latin-1
based to Latin-1 based EBCDIC code pages, which have been relatively smoth.

But this time the move is based on user requirements.

And, yes, the EURO will have to be introduced in other 8859 parts,
hopefully with the same code position it will definitely have in Latin 0.

At the same time, we'll continue to seek UNICODE-based systems, that is
absolutely not precluding this.

Alain LaBonté
Cornwall (Ontario)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT