Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?

From: John (Eljay) Love-Jensen (
Date: Tue Aug 25 2009 - 13:48:41 CDT

  • Next message: Asmus Freytag: "Re: Visarga, ardhavisarga and anusvara -- combining marks or not?"

    Hi alopecoid,

    > can the byte values for the ASCII characters appear by chance as the bytes in
    the 2nd to 4th positions of other UTF-8 characters?

    No. Only 0x80 - 0xBF appear in the 2nd to 4th positions.

    > Is it safe to assume that if I encounter a CR (carriage return, '\r') byte or
    a LF (line feed, '\n') byte, that this byte belongs to it's own single byte
    character value?


    > Or can the 8-bits that make up a CR or LF byte just happen to exist in another
    multi-byte character as bytes 2 through 4 of that character?


    All "trailing" UTF-8 encoding units have the bit pattern 10xxxxxx, so they
    will always be between 0x80 - 0xBF, safely avoiding '\n' (0x0A) and '\r'

    > I hope my question is clear.


    > Thank you.

    You're welcome.


    This archive was generated by hypermail 2.1.5 : Tue Aug 25 2009 - 13:52:16 CDT