Re: UTF-8 format

From: Jungshik Shin (
Date: Tue Aug 18 1998 - 13:34:29 EDT

On Mon, 17 Aug 1998, JATIN B KHANDELWAL wrote:

> Need information on UTF-8 format and conversion tables with respect to

  UTF-8 is specified in Unicode 2.0 manual and a simple C program to
illustrate how it works is included there(and also available Unicode ftp
archive at With it and JIS X 0201, JIS X 0208,
JIS X 0212 vs UCS-2 tables, it should be trivial to write a converter).
If you can't find it at nearby library, you may get the same(more
up-to-date) information by reading RFC 2279(
or any repository of IETF RFCs).

> Unicode 2.0, EUC, JIS and Shift-JIS.

  EUC stands for Extended Unix Code and it doesn't have any information
as to whether it's for Japanese. If you meant EUC-JP, please say so.
Also, you may as well note that there are EUC-KR(which is used in all
three major platforms, Unix, MS-DOS/MS-Windows, and MacOS and is
*preferred MIME* name for that encoding. MS-Windows 95/98 uses CP949
which is an extension of EUC-KR), EUC-CN(PRC), EUC-TW(ROC/Taiwan) as

  There are several converters supporting conversion between a number of
pre-Unicode encodings and UTF-8/UTF-7 and other Unicode/UCS-2/UCS-4
derived encodings. The newest version of JDK 1.2 offers the most
extensive coverage of encodings so that you may wish to get JDK 1.2 and
try 'native2ascii' and related tools/methods. Other converters include
'tcs'(written for Plan9 at BellLab and back-ported to Unix and other OS., uniconv(included in Gaspar
Sinai's yudit, an excellent Unicode editor for Unix/X11. and Ken
Lunde's perl script mentioned by him a couple of days ago. Be aware that
tcs is out-dated. In addition, some recent versions of POSIX-compliant
systems have iconv() which supports UTF-8 and three Japanese encodings
of your interest along with many others.

    Jungshik Shin

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT