RE: the HOW-TO of converting Chinese to Unicode HTML

From: Carl W. Brown (
Date: Thu May 24 2001 - 11:11:19 EDT


If you want to look at the source code of a converter you can always look at the ICU source. There you can look at various converters for Big5/GB2312, EUC and iso-2022-cn converters that convert to and from Unicode. Being open source you can look at both the tables and code. It is not easy to follow since it uses a generalized converter system with special routines for specialized types of conversion. The converter code is fairly will broken out so you won't have to go through the rest of ICU.

I still don't know why you have to understand how the converters work. If you want the code point relationships look at the conversion tables. Use convrtrs.txt to find the converter name. For example EUC-CN is ibm-1383. Open ibm-1383.ucm and you will find the code point maps and converters controls.

They also have documentation on the GB18030 mapping data on the web site that you might find interesting.


-----Original Message-----
From: []On
Behalf Of Magda Danish (Unicode)
Sent: Thursday, May 24, 2001 1:32 AM
Subject: FW: the HOW-TO of converting Chinese to Unicode HTML


        -----Original Message-----
        From: augustus
        Sent: Wed 5/23/2001 6:31 PM
        Subject: the HOW-TO of converting Chinese to Unicode HTML


        How are you ding? I have a question would like to ask.

        I need to know the HOW-TO of converting Chinese characters into
Unicode HTML (in Traditional Chinese)?

        I am writing a web page that will retrieve Chinese sentences
from database, then E-mail to my clients accordingly. Since some of my
clients mail software can't interpret Chinese unless I put the Chinese
in Unicode HTML format.

        My converter software can do this convertion, but since I cannot
see it's program source code, I don't know HOW they do it. I want to
know the logic / algorithm / method of this convertion. Do you know
where I can learn that?

        my email is

        Thanks a lot!!!


This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT