5 & 6 byte UTF-8 encodings?

From: O'Leary, Sean (NJ) (oleary@msmail.awii.com)
Date: Wed Aug 18 1999 - 09:31:46 EDT

Next message: Frank da Cruz: "Re: Last Call: UTF-16"
Previous message: Michael Everson: "Four new papers"
Next in thread: Mark Davis: "Re: 5 & 6 byte UTF-8 encodings?"
Maybe reply: Mark Davis: "Re: 5 & 6 byte UTF-8 encodings?"
Maybe reply: Markus Kuhn: "Re: 5 & 6 byte UTF-8 encodings?"
Maybe reply: John Cowan: "Re: 5 & 6 byte UTF-8 encodings?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

OK, I'm confused. My reading of the UTF-8 spec leads me to believe that
UTF-8 encodes characters are encoded in a maximum of 4 bytes. Characters
from planes 0x1 through 0xF should always be handled as surrogates.

Yet, I've seen UTF-8 explanations that show planes 0x1 through 0xF encoded
as 5 & 6 byte sequences.

Are these 5 & 6 bytes encodings valid UTF-8? ...or... do they fall under
the category of "Be generous in what you accept."?

Sean O'Leary
oleary@awii.com
Automated Wagering International
973-594-5077

Next message: Frank da Cruz: "Re: Last Call: UTF-16"
Previous message: Michael Everson: "Four new papers"
Next in thread: Mark Davis: "Re: 5 & 6 byte UTF-8 encodings?"
Maybe reply: Mark Davis: "Re: 5 & 6 byte UTF-8 encodings?"
Maybe reply: Markus Kuhn: "Re: 5 & 6 byte UTF-8 encodings?"
Maybe reply: John Cowan: "Re: 5 & 6 byte UTF-8 encodings?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT