RE: UTF8 and COntrol Characters

From: Abdij Bhat (
Date: Wed Nov 05 2003 - 01:11:24 EST

  • Next message: Doug Ewell: "Re: UTF8 and COntrol Characters"

    Hi Doug,
     Yes, the UNICODE we are using is UTF16 (VC++).
     Yes, the control characters are entirely below 0x20 ASCII.

     Thanks a lot for the information. So, I believe, we are safe using the
    conversion without breaking the hardware.

    Thanks and Regards,
    Abdij Bhat
    Kshema Technologies
    Phone:+91 80 860 3600 (Extension 2102)
    Fax: +91 80 860 3372

    -----Original Message-----
    From: Doug Ewell []
    Sent: Wednesday, November 05, 2003 11:45 AM
    To: Unicode Mailing List
    Cc: Abdij Bhat
    Subject: Re: UTF8 and COntrol Characters

    Abdij Bhat <Abdij dot Bhat at kshema dot com> wrote:

    > If a UNICODE strings is converted to UTF8, will the UTF8 encoded
    > string contain and control character or escape sequences? If so, is it
    > possible to eliminate the same?

    By "UNICODE" I assume you mean UTF-16, which is one encoding form of
    Unicode (as is UTF-8).

    By "control character[s] or escape sequences" I assume you mean
    characters below 0x20 in ASCII, or below U+0020 in Unicode (any encoding
    form). (That is, I assume we are not talking about the so-called C1
    control characters from U+0080 to U+009F.)

    A UTF-8 string will contain control characters if and only if the
    corresponding UTF-16 string contains characters in the control range
    (below U+0020).

    For strings that consist entirely of ASCII characters, the UTF-8
    representation is identical to the ASCII representation.

    For the specification of UTF-8, including some examples that should help
    answer your questions, see (pp. 24-25).

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 01:58:11 EST