Re: New Charakter Proposal

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Nov 01 2002 - 17:47:57 EST

Next message: Tom Gewecke: "DTV Captioning Character Set"

Previous message: Murray Sargent: "RE: Names for UTF-8 with and without BOM"
Maybe in reply to: William Overington: "Re: New Charakter Proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

David Starner wrote:
>>Chances are nearly 100% that overlong UTF-8 was a spoofing attempt, or the
>>result of something other than a UTF-8 encoder.
>
> With the exception of overlong sequences for null (C0 80?), which Java
> generates in an attempt to avoid true nulls.

I am aware of this one. This encoding is not UTF-8, however - it is more like CESU-8 with a 2-byte
encoding for NUL. Even if some documentation claims this to be UTF-8, it isn't, and a conformant
UTF-8 decoder must reject byte sequences from this beast that don't belong in UTF-8 - and the same
for a CESU-8 decoder.

This rather proves my point above.

markus

Next message: Tom Gewecke: "DTV Captioning Character Set"
Previous message: Murray Sargent: "RE: Names for UTF-8 with and without BOM"
Maybe in reply to: William Overington: "Re: New Charakter Proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 18:34:39 EST