5 & 6 byte UTF-8 encodings?

From: O'Leary, Sean (NJ) (oleary@msmail.awii.com)
Date: Wed Aug 18 1999 - 09:31:46 EDT


OK, I'm confused. My reading of the UTF-8 spec leads me to believe that
UTF-8 encodes characters are encoded in a maximum of 4 bytes. Characters
from planes 0x1 through 0xF should always be handled as surrogates.

Yet, I've seen UTF-8 explanations that show planes 0x1 through 0xF encoded
as 5 & 6 byte sequences.

Are these 5 & 6 bytes encodings valid UTF-8? ...or... do they fall under
the category of "Be generous in what you accept."?

Sean O'Leary
oleary@awii.com
Automated Wagering International
973-594-5077



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT