From: Theodore H. Smith (delete@elfdata.com)
Date: Mon Oct 11 2004 - 10:46:37 CST
Thanks Phillippe,
> in that file, all UTF-8 sequences with 5 bytes or more are invalid
> (they are not "boundary cases").
Thanks.
> So the list of "impossible bytes" is longer than documented there.
Is it just a case of moving the boundary cases into the impossible
bytes? Or are there impossible bytes that simply aren't in the file?
> - the file mixes UTF-8 and UTF-16
Does this file mix UTF-8 and UTF-16? I thought it just had surrogates
encoded into UTF-8? Of course a surrogate should never exist in UTF-8.
-- Theodore H. Smith - Software Developer. http://www.elfdata.com
This archive was generated by hypermail 2.1.5 : Mon Oct 11 2004 - 10:50:19 CST