From: John (Eljay) Love-Jensen (email@example.com)
Date: Tue Aug 25 2009 - 13:48:41 CDT
> can the byte values for the ASCII characters appear by chance as the bytes in
the 2nd to 4th positions of other UTF-8 characters?
No. Only 0x80 - 0xBF appear in the 2nd to 4th positions.
> Is it safe to assume that if I encounter a CR (carriage return, '\r') byte or
a LF (line feed, '\n') byte, that this byte belongs to it's own single byte
> Or can the 8-bits that make up a CR or LF byte just happen to exist in another
multi-byte character as bytes 2 through 4 of that character?
All "trailing" UTF-8 encoding units have the bit pattern 10xxxxxx, so they
will always be between 0x80 - 0xBF, safely avoiding '\n' (0x0A) and '\r'
> I hope my question is clear.
> Thank you.
This archive was generated by hypermail 2.1.5 : Tue Aug 25 2009 - 13:52:16 CDT