Re: UTF-8 stress test file?

From: Clark Cox (clarkcox3@gmail.com)
Date: Tue Oct 12 2004 - 13:13:44 CST

  • Next message: Philippe Verdy: "Re: UTF-8 stress test file?"

    On Tue, 12 Oct 2004 20:25:16 +0200, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
    > From: "Doug Ewell" <dewell@adelphia.net>
    > > Theodore H. Smith <delete at elfdata dot com> wrote:
    > >
    > >>> - the file mixes UTF-8 and UTF-16
    > >>
    > >> Does this file mix UTF-8 and UTF-16? I thought it just had surrogates
    > >> encoded into UTF-8? Of course a surrogate should never exist in UTF-8.
    > >
    > > You are right. Philippe's statement was incorrect, and also puzzling.
    >
    > Have you read the file content? It clearly and explicitly speaks about
    > UTF-16,

    Only in the context of surrogates, and then only in the context of
    illegal UTF-8 sequences. The file doesn't contain any UTF-16 encoded
    text.

    > unless the file was used as a test for CESU-8

    The whole point of the CESU-8-like section is that it is not legal UTF-8.

    -- 
    Clark S. Cox III
    clarkcox3@gmail.com
    http://www.livejournal.com/users/clarkcox3/
    http://homepage.mac.com/clarkcox3/
    


    This archive was generated by hypermail 2.1.5 : Tue Oct 12 2004 - 13:17:55 CST