Re: UTF-8 stress test file?

From: Theodore H. Smith (delete@elfdata.com)
Date: Mon Oct 11 2004 - 10:46:37 CST

Next message: Mike Ayers: "RE: bit notation in ISO-8859-x is wrong"

Previous message: Philipp Reichmuth: "Re: List of Japanese Shift_JIS characters which are not supported in Unicode"
In reply to: Philippe Verdy: "Re: UTF-8 stress test file?"
Next in thread: Doug Ewell: "Re: UTF-8 stress test file?"
Reply: Doug Ewell: "Re: UTF-8 stress test file?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Thanks Phillippe,

> in that file, all UTF-8 sequences with 5 bytes or more are invalid
> (they are not "boundary cases").

Thanks.

> So the list of "impossible bytes" is longer than documented there.

Is it just a case of moving the boundary cases into the impossible
bytes? Or are there impossible bytes that simply aren't in the file?

> - the file mixes UTF-8 and UTF-16

Does this file mix UTF-8 and UTF-16? I thought it just had surrogates
encoded into UTF-8? Of course a surrogate should never exist in UTF-8.

--
     Theodore H. Smith - Software Developer.
     http://www.elfdata.com

Next message: Mike Ayers: "RE: bit notation in ISO-8859-x is wrong"
Previous message: Philipp Reichmuth: "Re: List of Japanese Shift_JIS characters which are not supported in Unicode"
In reply to: Philippe Verdy: "Re: UTF-8 stress test file?"
Next in thread: Doug Ewell: "Re: UTF-8 stress test file?"
Reply: Doug Ewell: "Re: UTF-8 stress test file?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Oct 11 2004 - 10:50:19 CST