Re: UTS#40 (BOCU-1) special handling of large blocks

From: Frank Ellermann (nobody@xyzzy.claranet.de)
Date: Thu Feb 08 2007 - 10:14:53 CST

Next message: Johannes Bergerhausen: "Bulgarian Cyrillic"

Previous message: Doug Ewell: "Re: UTS#40 (BOCU-1) special handling of large blocks"
In reply to: Doug Ewell: "Re: UTS#40 (BOCU-1) special handling of large blocks"
Next in thread: Doug Ewell: "Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:

>> In general, if you make an incompatible change - a change where an old
>> decoder cannot cope with the output from an updated encoder - then you
>> must change the name of the charset.

> UTF-8 was initially defined to work across the entire original 31-bit
> ISO 10646 code space, with sequences up to 6 bytes long, before Unicode
> and 10646 agreed to limit the range to U+10FFFF. The definition of
> UTF-8 appears to have been changed, and I've personally seen several
> decoders that recognized the longer sequences, but AFAIK the name
> "UTF-8" was never changed or qualified with a version number.

Old UTF-8 decoders can deal with valid "new" UTF-8. In theory a "new"
decoder is lost with "old" UTF-8 above U+10FFFF, but in practice that's
irrelevant.

The only real difference I'm aware of are old overlong constructs. When
I implemented UTF-8 I used the old format for error recovery, after a
"new" invalid lead byte I replace it by a single U+FFFD skipping all
plausible trailing bytes. An attempt to limit the reported errors to a
minimum, but not for 0xFE or 0xFF, because that was always invalid.

Frank

Next message: Johannes Bergerhausen: "Bulgarian Cyrillic"
Previous message: Doug Ewell: "Re: UTS#40 (BOCU-1) special handling of large blocks"
In reply to: Doug Ewell: "Re: UTS#40 (BOCU-1) special handling of large blocks"
Next in thread: Doug Ewell: "Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Feb 08 2007 - 10:32:32 CST