Re: Just if and where is the then?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 05 2004 - 10:32:07 CDT

Next message: Doug Ewell: "Re: Just if and where is the then?"
Previous message: John Jenkins: "Re: Just if and where is the sense then?"
In reply to: Jon Hanna: "Re: Just if and where is the then?"
Next in thread: Jon Hanna: "Re: Just if and where is the then?"
Reply: Jon Hanna: "Re: Just if and where is the then?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Jon Hanna" <jon@hackcraft.net>
> > Knowing that Unicode-ISO/IEC 10646 is a now de facto standard (after being a
> > de
> > jure one in ISO) will clearly guide those charset developments complying
> > with
> > Unicode rules and policies, so that such adoption will not create a
nightmare
> > to
> > handle, with unreasonable additional costs for transcoding to/from/through
> > Unicode.
>
> If you can't round-trip directly then the cost is unreasonable.

I never said that. Reread. I said that round-trip conversion is possible even
with 1-to-N mappings, provided that such character subset is carefully created
so that it will not create ambiguities.

> > I see absolutely no problem if new ISO-8859-* variants is added in the
> > future
> > for better support of African or Asian languages (or even for European ones,
> > i.e. Georgian and Armenian), and no opposition of principles if some newer
> > ISO2022 charset is created for Canadian Syllabics or Ethiopic if this helps
> > processing the corresponding languages.
>
> If they can be round-tripped trivially (as trivially as the current ISO-8859
> family) then I see no problem either, but I also see little point, and the
> motivation gets less every year. Frankly, we have a global encoding now. It
has
> problems (many of which come from the fact that it was not practical to act as
> if we were at encoding year-zero - if we had then we probably wouldn't have
> precomposed characters for European languages, never mind any others) but
those
> problems are considerably less than existed previously and ISO-8859-17+ is
> always going to be inferior to UTF-8 or UTF-16.

Can you get a reasonnable estimate of what you consider a medium or long term
solution? For me ISO-8859-1/2 will continue to be used for very long periods.
This is a natural consequence of the _slow_ migration or replacement of working
softwares and the cost of new developments. There are many reasons why old
software continue to run today when they were developed 20 years ago long before
Unicode ever existed.

In the computer industry there's a geeneral motto that says "if it works and if
it doesn't break, don't change it!". Some of the oldest developped softwares
have become so business critical that they have been scrutinized and maintained
with extreme care, notably against security vulnerability. Rewriting a new code
for these application is a high risk which also exposes to very long and costly
compatibility and interoperability tests.

You can't simply and immediately replace a piece of software in mission critical
applications, you must also make sure that other "companion" softwares will work
with it, and you need migration plans that include testing multiple supported
interfaces to interact with old softwares, and also making sure that the new
code is not exposed to many more, new and undetected, vulnerabilities which were
absent from the old software.

In some cases, it is even impossible to replace it, and there will be no viable
alternative before many years, due to lack of general purpose usage (notably for
the many softwares that work with organization-specific data, often kept
proprietary and secret).

Today there are so many softwares that depend on 8-bit processing with simple
assumptions based on 1 byte = 1 character that you won't create a revolution. I
bet that 8-bit charsets will continue to be supported in 20 or 30 years, even if
these systems are adapted with new interfaces to Unicode-enabled systems.

Think about most OS kernels and filesytems, or device configuration: they simply
use 8-bit charsets internally and there's no way to adapt them to work with
variable-length multibyte encodings (there are too many related security issues
for untested cases).

Next message: Doug Ewell: "Re: Just if and where is the then?"
Previous message: John Jenkins: "Re: Just if and where is the sense then?"
In reply to: Jon Hanna: "Re: Just if and where is the then?"
Next in thread: Jon Hanna: "Re: Just if and where is the then?"
Reply: Jon Hanna: "Re: Just if and where is the then?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT