Re: UTF-8 stress test file?

From: Theodore H. Smith (
Date: Mon Oct 11 2004 - 10:46:37 CST

  • Next message: Mike Ayers: "RE: bit notation in ISO-8859-x is wrong"

    Thanks Phillippe,

    > in that file, all UTF-8 sequences with 5 bytes or more are invalid
    > (they are not "boundary cases").


    > So the list of "impossible bytes" is longer than documented there.

    Is it just a case of moving the boundary cases into the impossible
    bytes? Or are there impossible bytes that simply aren't in the file?

    > - the file mixes UTF-8 and UTF-16

    Does this file mix UTF-8 and UTF-16? I thought it just had surrogates
    encoded into UTF-8? Of course a surrogate should never exist in UTF-8.

         Theodore H. Smith - Software Developer.

    This archive was generated by hypermail 2.1.5 : Mon Oct 11 2004 - 10:50:19 CST