Re: Converting EBCDIC to Unicode

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Feb 12 2003 - 02:47:37 EST

Next message: Andy White: "RE: Indic Vowel/Consonant combinations"

Previous message: Doug Ewell: "Re: bidi in unipad"
In reply to: Markus Scherer: "Re: Converting EBCDIC to Unicode"
Next in thread: Carl W. Brown: "RE: Converting EBCDIC to Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Markus Scherer <markus dot scherer at jtcsv dot com> wrote:

>> They are all the same in the A-Z, a-z, and 0-9
>> ranges, but beyond that they can differ substantially.
>
> There are some more characters that have the same codes in most EBCDIC
> codepages, but there are also some where the Latin letters are not all
> present. (I think some old Japanese EBCDIC codepages replace small
> Latin letters with Katakana ones.)

Indeed, I was oversimplifying things a bit. There are other invariant,
or almost-invariant, EBCDIC characters. For example, SPACE had better
be invariant or there will be serious problems!

In 1997 I did a quick study of all the EBCDIC code pages on the DKUUG
FTP site, which I think was about 25 code pages, and made a list of the
characters that were the same in every single page:

0x40 SPACE
0x4B .
0x4D (
0x4E +
0x5C *
0x5D )
0x5E ;
0x60 -
0x61 /
0x6B ,
0x6D _
0x6E >
0x6F ?
0x7A :
0x7D '
0x7E =

as well as the letters and numbers already mentioned:

0x81-0x89 a-i
0x91-0x99 j-r
0xA2-0xA9 s-z
0xC1-0xC9 A-I
0xD1-0xD9 J-R
0xE2-0xE9 S-Z
0xF0-0xF9 0-9

There were some other characters that were the same in ALMOST all code
pages, such as the ampersand at 0x50. I think it was some kind of Greek
EBCDIC page that put a different character at 0x50. Amusingly, the
greater-than sign is constant at 0x6E, but the less-than sign (though
always present) is not on the list because it floats among different
character positions.

The DKUUG site may not have included the Katakana code page that Markus
mentioned, although such a thing is described extensively in Chapter 18
of Mackenzie. Doubtless there are other versions of EBCDIC that assign
different characters to even these "invariant" code positions. Putting
an end to this kind of thing is one of the reasons we love Unicode.

-Doug Ewell
Fullerton, California

Next message: Andy White: "RE: Indic Vowel/Consonant combinations"
Previous message: Doug Ewell: "Re: bidi in unipad"
In reply to: Markus Scherer: "Re: Converting EBCDIC to Unicode"
Next in thread: Carl W. Brown: "RE: Converting EBCDIC to Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Feb 12 2003 - 03:28:44 EST