Re: New UTF-8 decoder stress test file

From: Valeriy E. Ushakov (uwe@ptc.spbu.ru)
Date: Sun Sep 26 1999 - 13:10:40 EDT

Next message: Karl Pentzlin: "UTF-8, U+0000 and Software Development (was: Re: New UTF-8 decoder stress test file)"
Previous message: Markus Kuhn: "New UTF-8 decoder stress test file"
Maybe in reply to: Markus Kuhn: "New UTF-8 decoder stress test file"
Next in thread: Markus Kuhn: "Re: UTF-8, U+0000 and JDK"
Reply: Markus Kuhn: "Re: UTF-8, U+0000 and JDK"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Sun, Sep 26, 1999 at 09:22:26AM -0700, Markus Kuhn wrote:

> 4.3 Overlong representation of the NUL character
>
> The following five sequences should also be rejected like malformed
> UTF-8 sequences and should not be treated like the ASCII NUL
> character.
>
> 4.3.1 U+0000 = c0 80 = "?"

I belive that's exactly what JDK uses to encode U+0000 in utf-8
encoded NUL terminated C strings to distinguish U+0000 which is part
of a string from the terminating NUL. I can't find the reference,
though.

SY, Uwe

-- 
uwe@ptc.spbu.ru                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen

Next message: Karl Pentzlin: "UTF-8, U+0000 and Software Development (was: Re: New UTF-8 decoder stress test file)"
Previous message: Markus Kuhn: "New UTF-8 decoder stress test file"
Maybe in reply to: Markus Kuhn: "New UTF-8 decoder stress test file"
Next in thread: Markus Kuhn: "Re: UTF-8, U+0000 and JDK"
Reply: Markus Kuhn: "Re: UTF-8, U+0000 and JDK"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT