RE: UTF8 and COntrol Characters

From: Abdij Bhat (Abdij.Bhat@kshema.com)
Date: Wed Nov 05 2003 - 01:11:24 EST

Next message: Doug Ewell: "Re: UTF8 and COntrol Characters"

Previous message: Doug Ewell: "Re: UTF-16 inside UTF-8"
Maybe in reply to: Abdij Bhat: "UTF8 and COntrol Characters"
Next in thread: Doug Ewell: "Re: UTF8 and COntrol Characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi Doug,
Yes, the UNICODE we are using is UTF16 (VC++).
Yes, the control characters are entirely below 0x20 ASCII.

Thanks a lot for the information. So, I believe, we are safe using the
conversion without breaking the hardware.

Thanks and Regards,
Abdij Bhat
Kshema Technologies
mailto:abdij.bhat@kshema.com
www.kshema.com
Phone:+91 80 860 3600 (Extension 2102)
Fax: +91 80 860 3372

-----Original Message-----
From: Doug Ewell [mailto:dewell@adelphia.net]
Sent: Wednesday, November 05, 2003 11:45 AM
To: Unicode Mailing List
Cc: Abdij Bhat
Subject: Re: UTF8 and COntrol Characters

Abdij Bhat <Abdij dot Bhat at kshema dot com> wrote:

> If a UNICODE strings is converted to UTF8, will the UTF8 encoded
> string contain and control character or escape sequences? If so, is it
> possible to eliminate the same?

By "UNICODE" I assume you mean UTF-16, which is one encoding form of
Unicode (as is UTF-8).

By "control character[s] or escape sequences" I assume you mean
characters below 0x20 in ASCII, or below U+0020 in Unicode (any encoding
form). (That is, I assume we are not talking about the so-called C1
control characters from U+0080 to U+009F.)

A UTF-8 string will contain control characters if and only if the
corresponding UTF-16 string contains characters in the control range
(below U+0020).

For strings that consist entirely of ASCII characters, the UTF-8
representation is identical to the ASCII representation.

For the specification of UTF-8, including some examples that should help
answer your questions, see
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf (pp. 24-25).

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Doug Ewell: "Re: UTF8 and COntrol Characters"
Previous message: Doug Ewell: "Re: UTF-16 inside UTF-8"
Maybe in reply to: Abdij Bhat: "UTF8 and COntrol Characters"
Next in thread: Doug Ewell: "Re: UTF8 and COntrol Characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 01:58:11 EST