Re: UTF-EBCDIC to UTF-8

From: Doug Ewell (dewell@compuserve.com)
Date: Fri Jul 28 2000 - 09:42:30 EDT


Jeu George <jeu@india.dharma.com> wrote:

> Is their any conversion routine that transforms UTF-EBCDIC
> characters to UTF-8 characters.

UTF-8 is defined in Chapter 3, page 47, definition D36 of The Unicode
Standard, version 3.0. A table is given showing the conversion process.
If you don't have the book (I'm guessing you don't :-), then check out
the FAQ on the Unicode Web site at
<http://www.unicode.org/unicode/faq/encoding_allocation.html>
and look for the question "What is the definition of UTF-8?"

(What a relief it is finally to be able to point people to the Unicode
Web site for the definition of UTF-8!)

UTF-EBCDIC is defined in Unicode Technical Report #16, available at
<http://www.unicode.org/unicode/reports/tr16/>.

Both of these are well-defined, straightforward specs, and if you are
a programmer (especially in a language like C that allows easy bit
manipulation) you should not have any trouble writing the conversion
routines. Normally you would decode UTF-EBCDIC to Unicode scalar
values and then encode those in UTF-8, but I suppose it would also be
possible to go directly from UTF-EBCDIC to UTF-8. I can provide
programming hints if you like.

If you aren't a programmer and need to convert some existing data (where
did you find UTF-EBCDIC data, anyway?), I have written a pair of DOS
conversion utilities, "cp2uni" and "uni2cp" (and a wrapper, "cp2cp")
that will perform these conversions and many others. If you think you
will need these, please contact me privately (off the list).

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT