Re: Converting Pages to Unicode

From: James Kass (jameskass@worldnet.att.net)
Date: Sun Jul 14 2002 - 22:15:53 EDT


Kalairaja wrote,

> ...
> One more silly doubt.. is UTF-8 same as unicode, and can
> everything (8bit) be converted to UTF-8 (if it was easier).

UTF-8 is a scheme for storing and transmitting Unicode. Since
UTF-8 doesn't use hex bytes like "00" and other control characters,
it is considered a safe way to handle Unicode data.

For example, DEVANAGARI LETTER KA is U+0915. The hex byte
sequence which represents U+0915 in UTF-8 is E0,A4,95. This is
three bytes and Unicode (UTF-16) needs only the two bytes.
This e-mail program supports UTF-8 encoding, if your e-mail
program also supports UTF-8, you'll see the KA here ( क ), otherwise
you'll only see junk.

Many applications can automatically convert between UTF-8 and
other Unicode formats. For example, with Internet Explorer
you can go to a Chinese Big-5 encoded page and "File-Save As"
Unicode(UTF-8). This will automatically convert from Big5 to
Unicode.

This won't help with converting custom Devanagari font encodings,
though. Perhaps the information in my previous post can help.

For some additional information about Unicode conversions, please
visit ICU:
http://oss.software.ibm.com/icu/charset/

Best regards,

James Kass.



This archive was generated by hypermail 2.1.2 : Sun Jul 14 2002 - 20:55:13 EDT