RE: Invalid code points

From: Mark Crispin (mrc+unicode@panda.com)
Date: Mon Jun 01 2009 - 10:46:37 CDT

Next message: Hans Aberg: "Re: Invalid code points"

Previous message: Phillips, Addison: "RE: Invalid code points"
In reply to: Phillips, Addison: "RE: Invalid code points"
Next in thread: Asmus Freytag: "Re: Invalid code points"
Reply: Asmus Freytag: "Re: Invalid code points"
Reply: Hans Aberg: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Mon, 1 Jun 2009, Phillips, Addison wrote:
> Uh... the IETF does not define UTF-8. The Unicode Consortium does. But
> even if you want to build on the IETF documents, RFC 3629 was published
> six years ago. Basing a new implementation on something published 11
> years ago and obsolete the last six years? Not a good idea.

This is true; but generally within the IETF specifications are upwards
compatible.

I think that are two obvious implementation choices:

[1] Recognize the sequences for the 0x110000 - 0x7fffffff ranges, never
generate them, and if a value in that range is encountered treat it as an
"error" or "not in Unicode" value. This is the traditional IETF
philosophy.

[2] Strictly enforce the rules for "well formed UTF-8 byte sequences" on
page 104 of Unicode 5.0, and reject any string which fails to comply (note
in particular the requirements of the second byte).

In all cases, what is generated must strictly comply with "well formed
UTF-8 byte sequences".

I have little doubt that Unicode would tend to advocate choice [2], but as
noted above the "IETF way" would be choice [1].

As a practical matter, it should not make any difference. You should
never expect anything other than a well-formed sequence to work.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.

Next message: Hans Aberg: "Re: Invalid code points"
Previous message: Phillips, Addison: "RE: Invalid code points"
In reply to: Phillips, Addison: "RE: Invalid code points"
Next in thread: Asmus Freytag: "Re: Invalid code points"
Reply: Asmus Freytag: "Re: Invalid code points"
Reply: Hans Aberg: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jun 01 2009 - 10:48:50 CDT