Re: Problem with ConvertUTF.c?

From: Theodore H. Smith (delete@softhome.net)
Date: Tue Jul 16 2002 - 20:07:39 EDT


Seems like I missed the isLegalUTF8 function calls that verified
if the UTF was valid UTF8, nevermind then, its all OK.

On Wednesday, July 17, 2002, at 01:57 , Theodore H. Smith wrote:

> The file ConvertUTF.c contains this array:
>
>
> static const char trailingBytesForUTF8[256] = {
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5
> };
>
> Doesn't UTF8 only have 4 bytes maximum? So then the entries
> above 3 should not be there.
>
> There could be similar mistakes with 6 byte UTF8 codes. I think
> this file may have been written before UTF8 was tightened up.
> Perhaps this code should be tightened up along with the
> standard now?
>

--
     Theodore H. Smith - Macintosh Consultant / Contractor.
     My website: <www.elfdata.com/>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 16 2002 - 19:35:44 EDT