Re: Frequent incorrect guesses by the charset autodetection in IE7

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun Jul 16 2006 - 18:48:52 CDT

  • Next message: Magda Danish \(Unicode\): "Unicode Releases Common Locale Data Repository, Version 1.4"

    On 7/16/2006 4:56 AM, Philippe Verdy wrote:
    > There was nothing wrong in the ISO-8859 standard series. ISO just stopped working on this, because there remained nobody wanting to continue the work in maintaining a 7/8 bit standard, when all focus (and a very large consensus at ISO) was for accelerating the development of the newer ISO 10646 standard, that the indistry and lots of governments and organizations wanted to develop.
    >
    fair statement
    > What is important to understand here is that ISO has changed its priority: instead of creating many non-interoperable 7/8 bit encodings, there was more value into creating a common international standard that would contain a universal repertoire of characters.
    >
    ditto
    > Nothing in the iso 10646 standard or Unicode forbids any country from deriving a 7/8 bit standard for their national usage and publishing it so that it can be supported with low or no cost by software vendors. nothing forbids them to even make its support required for use in future products sold in their countries, if they think it will be in the country's interest.
    >
    I would quibble with 'low cost'. The total-lifetime cost of a new 7/8
    bit standard is considerable, since it eventually does have to interwork
    with 10646 and Unicode, and the more 7/8 bit sets exist, the more
    difficult it becomes to manage the legacy sets in a clean way.
    > But honestly, the whole 7/8 bit encodings collection was becoming more and more problematic and impossible to maintain consistently while also ensuring interoperability! Only the ISO 10646 standard allowed to reconcile the past encompatible standards, offering a uniform way to handle international text and converting with much less errors between otherwise incompatible encodings.
    >
    I think you are trying to say the same thing here.
    > The ISO body has NOT deprecated the ISO 646 and ISO 8859-* standard series because of course they are widely used (and will continue to be used at large scale for very long, probably many decennials, if not more than a century, unless there's a complete change of computing technogy and the changeover occurs at large scale; I even think that ISO 646/US and possibly ISO 8859-1 will even survive the ISO 10646 standard when it will be replaced by something better based on a new text encoding paradigm with additional objectives not addressed today in ISO 10646 and Unicode...)
    >
    ISO standards need to be affirmed or updated every 5 years. The
    character coding community realized that data, unlike parsers, operating
    systems, renderers and all other elements of software technology, once
    created remain in their original format. Therefore they pushed for an
    option to allow archiving of unchanged standards - keeping them
    officially available for people in need of interpreting legacy data, but
    not withdrawing them nor updating them. This is not the same as
    deprecation, which is usually the first step to withdrawal of a feature
    from a standard (and is a term that does not apply to ISO standards as a
    whole).

    Except for minor tweaks in language and terminology, affecting mostly
    the text of these standards and not the way they were supposed to
    beused, the 8859 standards could have been archived long time ago. They
    are utterly stable and need to be so.
    > Don't say that Unicode and ISO 10646 does not work. All proves today that these standards are very successful and that their implementation is advancing fast, and available on many computers, supported by most languages and tools now, and that efficient implementation is possible and available for all, on all types of systems (from the smallest hand-held device to the largest mainframes or server farms or computing grids).
    >
    > The complete migration from legacy 7/8 bit encodings to ISO/IEC 10646 is an international ongoing effort which is successful and has really helped decreasing the digital divide between the richest countries that have the power to require support for their legacy 7/8-bit encodings in their languages, and the poorest countries that had languages whose 7/8-bit encoding was rarely supported. With ISO 10646, softwares can be written once to support input, handling and rendering of all languages and cultures of the world.
    >
    Data will likely never migrate - which is one of the factors that makes
    adding any new 7/8 bit set so expensive: if it becomes popular at all,
    it needs to be kept around essentially forever or there's the risk of
    abandoning data.
    > With ISO 10646 (and the help of Unicode in its effective implementation), it is now in fact much less expensive to convice commercial companies to support fully internationalizable softwares, because this single standard can be understood by everyone in the world, and it also allows collaboration with more parties than just a single supporting government or organization.
    >
    > You want support for a language or script? youy don't need to develop a new standard. Instead you just need to document a minimum set of missing characters to support, and they will be added to the same existing standard, and easily supported in existing applications, after others have contributed input methods, keyboard drivers, fonts... and academic sources can already work on producing text corpus, and studying the rules needed to develop stable orthographies for many rare languages. Most of the technologies and usage policies will already be there and documented.
    >
    > In other words, ISO 10646 really saves money everywhere in the world, unlike the past incompatible 7/8 bit encodings.
    >
    Fair conclusion.

    A./



    This archive was generated by hypermail 2.1.5 : Sun Jul 16 2006 - 18:57:09 CDT