Re: UTF-8 text files

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Jun 06 2005 - 09:08:15 CDT

  • Next message: Samuel Thibault: "Re: UTF-8 text files"

    Antoine Leca <Antoine10646 at leca dash marti dot org> wrote:

    >> It's a contrived example, but the string "NESTLÉ™" encoded in Latin-1
    >
    > It is a minor nit, but ™ (U+2122) does not appear in my Latin-1 (ISO/
    > IEC 8859-1:1998) charts; of course, this character appears at position
    > 9/9 in the Windows 1250, 1252, 1254, 1257, 1258 codepages (and also in
    > some others, but those do not have É at 12/9).

    Arrggh... you are right, of course, and I am guilty of the same mistake
    I've seen many times before.

    Of course, the string I gave as an example was meant to be encoded in
    Windows code page 1252, not in ISO 8859-1 as I said. U+0099 doesn't
    even have a name.

    It is still possible to come up with a plausible example of text that is
    both valid UTF-8 and plausible Latin-1, and I need to find one -- not
    only because my current example is Windows-specific, but also because
    Nestlé is not even a trademark (™) but a registered trademark (®).

    --
    Doug Ewell
    Fullerton, California
    http://users.adelphia.net/~dewell/
    


    This archive was generated by hypermail 2.1.5 : Mon Jun 06 2005 - 09:10:22 CDT