re: VS: Euro Sign in 8859-15 (was: Re: Indian Rupee Sign to be chosen today)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Jun 27 2010 - 03:33:36 CDT

  • Next message: Philippe Verdy: "RE: Generic Base Letter"

    All the previous things about ISO 8859 is true, but if the Euro symbol
    had the success it has (and it works remarkably well) is that Windows
    is used on a lot of PCs :
    Microsoft modified its all its Windows code pages (unformally named
    "ANSI" due to the name of legacy Win16 APIs which were also ported to
    Win32) used in Europe to include the Euro symbol in position 0x80
    (which was not used in those code pages).

    There are still unused positions in Windows codepages, but most of
    them were built on top of ISO 8859, by dropping all C1 controls (not
    needed for Windows and not even for DOS compatibility), freeing 16
    positions for some commonly used punctuation signs, then the euro.

    Microsoft could still decide to repeat it for the codepages used in
    India. But even there, Windows display the Indic scripts using Unicode
    (and not the ISCII standard).

    Microsoft will certaily modify its mapping to Unicode for supporting
    the ISCII standard, if it allocates a position there, and other
    vendors will follow as well.

    When the Euro was added, there was no real need to modify the 8859
    pages and this was not done. Microsoft decided to modify its European
    Windows "ANSI" codepages only because at that time, it was still
    supporting older systems that needed a compatibility with DOS, and
    where Unicode was still not used internally in the system (notably
    Win16 and Win32s systems like Windows 3.1x and Windows 95/98/ME that
    still did not really use a true Unicode-enabled kernel, and did not
    even support the NTFS filesystem used on NT and the newer Windows
    2000).

    IBM also had to adapt its many codepages used on various systems (but
    these systems were already becoming very marginalized). This caused
    lots of havoc (including also because there were so many variants of
    EBCDIC...)

    Apple decided to follow a direction completely opposite to IBM, to not
    change anything, given that its legacy Mac codepages were already
    deprecating (Apple adopted the OS-level use of Unicode probably much
    faster than Microsoft, the latter initially reserved it only for its
    "professional" NT systems when the former had already decided to stop
    maintaining or adding new 8-bit codepages).

    But for the Indian Rupiah, there's no need to change anything : all
    systems needed for India are already Unicode-enabled (and older
    ISCII-based systems are now almost all extinct, so I doubt that there
    will even exist any need to change it : these systems will continue to
    use the existing usual abbreviations). The Indian government just has
    to sponsor its encoding in Unicode.

    Let's not repeat the IBM tragedy... India certainly has better places
    to put its public (and private) money in, than for reviving and
    adapting old and dying national 8-bit encoding standards (that will
    still terminate their life without the new symbol addition if they
    don't support Unicode).

    Today the world is connected to Internet for almost everything, and
    the Internet uses Unicode more than all other encodings combined.

    Philippe.

     "Erkki I Kolehmainen" <eik@iki.fi> wrote:
    > At the time I was the European project team leader for the standardization
    > of the euro, and as such I was strongly pushing for the addition of the euro
    > sign to Latin-1, which could not be done without adding a new part, which
    > then had to be done for the visibility. I fully agree with Ken (as he quite
    > well knows, I trust) that no new character encoding standardization should
    > have been done for quite a while on anything but the 10646/Unicode. As is,
    > the use of any of the 8859 parts can no longer be really be justified for
    > any purpose, and with 10646/Unicode the euro sign works extremely reliably.
    >
    > Sincerely, Erkki
    > ----
    > Kenneth Whistler wrote:
    > > On Fri, 25 Jun 2010, I wrote
    > >
    > > > Even in the year 2010, the euro sign () doesn't work reliably.
    > >
    > > in both the Unicode list and in the newsgroup de.test.
    > >
    > > unicode.org shows a euro sign:
    > > http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0372.html
    > >
    > > groups.google.com shows a currency sign:
    > > http://groups.google.co.uk/group/de.test/msg/e027e91e7ef17f62
    >
    > And as the snark seems to be spreading about this, let's step
    > into the Wayback Machine for a moment...
    >
    > When 8859-15 was originally proposed in 1997 (see SC2/WG3 N388R, for
    > those of you with deep document archives), primarily to add the euro
    > sign to an 8-bit character set (but also to "fix" 8859-1 for
    > French and Finnish), the U.S. NB voted against the subdivision
    > of work, claiming in the strongest of terms that the proposal
    > was inherently flawed and simply would not work to solve the
    > problem(s) it was addressed at.
    >
    > I'll quote at length from the U.S. NB comments in SC2 N2994,
    > dated 1997-11-21, "Summary of Voting on SC 2 N 2910, Proposal for
    > Project Subdivision of project JTC 1.02.20: a new part of ISO/IEC
    > 8859 for Latin Zero covering the EURO Symbol and Full Support for
    > the French and Finnish Language":
    >
    > ================================================================
    >
    > The US disapproves a project subdivision for ISO/IEC 8859-15 for
    > the following reasons:
    >
    > 1) It is the US long stated position that additional parts of
    > 8859 should not be created, except to capture existing 8-bit
    > practice (viz Part 11). Rather than addressing problems with
    > particular solutions, which are extremely costly to implement,
    > industry efforts should be focused on implementing
    > comprehensive solutions via the support of ISO/IEC 10646.
    >
    > 2) From document WG3 N 388 it is clear that the intent is to
    > replace ISO 8859-1, for the same user community. Because of
    > the prominent role that 8859-1 has gained as the default
    > character set in many internet protocols, introducing a near
    > equivalent standard will have disastrous effects. Due to their
    > large intersection part 1 and part 15 would appear to inter-operate
    > without proper adherence to announcing mechanisms. Were part 15
    > accepted and widely implemented, the result would be that no one
    > could be sure that ANY character from the non-intersecting part of
    > each set can be used reliably. In many ways, this situation is
    > reminiscent of the problems that plagued the 7-bit sets of ISO 646.
    >
    > 3) The adoption of ISO/IEC 10646 by the vendor community is
    > making rapid progress, therefore it cannot be argued that a
    > flawed solution must be accepted for lack of practical
    > alternatives.
    >
    > ================================================================
    >
    > It was already clear 13 years ago that 8859-15 wasn't going
    > to work. It shouldn't be too surprising that 13 years later
    > it still isn't working.
    >
    > As Mark indicated, the answer here is not to expect distributed
    > systems to be able to reliably distinguish 8859-1 and 8859-15,
    > when neither labelling nor heuristics for distinguishing them
    > are reliable in the first place. The answer for reliable
    > representation of the euro sign is to use UTF-8. And that answer
    > was already obvious in 1997.



    This archive was generated by hypermail 2.1.5 : Sun Jun 27 2010 - 03:40:16 CDT