Re: UTF-8 text files

From: Andrew West (
Date: Tue Jun 07 2005 - 07:07:24 CDT

  • Next message: Jan Strunk: "Multiple diacritics"

    On 06/06/05, Doug Ewell <> wrote:
    > It is still possible to come up with a plausible example of text that is
    > both valid UTF-8 and plausible Latin-1, and I need to find one -- not
    > only because my current example is Windows-specific, but also because
    > Nestlé is not even a trademark (™) but a registered trademark ((r)).

    Something like 2×˝=1 (2 times one half equals 1) which is <32 D7 BD 3D
    31> in ISO-8859-1 is both plausible Latin-1 and valid UTF-8 = <0032
    05FD 003D 0031>. Although the resultant UTF-8 text in this example is
    meaningless as U+05FD is not (yet) an assigned character, most editors
    do not check for character validity (if they did they would not be
    forward compatible with future versions of Unicode), and so will
    happily assume that this example is UTF-8 rather than any other
    character set -- certainly Notepad automatically opens a file
    containing this example as UTF-8.


    This archive was generated by hypermail 2.1.5 : Tue Jun 07 2005 - 07:08:41 CDT