Re: EBCDIC

From: Tony Harminc (tzha0@juts.ccc.amdahl.com)
Date: Tue Jul 08 1997 - 19:31:20 EDT


On 8 Jul 97 at 12:17, Ken Whistler wrote:

> But the "horse's mouth", so to speak is:
>
> Character Data Representation Architecture Reference and Registry
> (CDRA), Document SC09-2190-00, Second Edition, December 1995. IBM.

This is an excellent reference, but it contains almost no history or
context. As well as the references others have given, I'd suggest
the SHARE publication "ASCII and EBCDIC Character Set and Code Issues
in Systems Application Architecture" (1989). This is probably still
available on paper from SHARE Inc. in Chicago as SSD #366. This book
is largely the work of Ed Hart, and perhaps he can tell us if it's
available on line somewhere. Certainly a number of people on this
list were active in this stuff way back in 1989...

To really understand EBCDIC and its context requires some
understanding of its punched-card background. It is also helpful to
know something about the IBM 2741 typewriter terminal (a non-EBCDIC
device, btw), and very important to know something of the IBM 3270
series display terminal architecture.

A few mainly historic EBCDIC tidbits:

It has always been an 8-bit code, even in its earliest days.

The displayable characters are in the range X'40'-X'FE'. (Forgive my
old-mainframer's notation...)

There are discontinuities in the alphabetic ranges, so constant
addition was never used to change case as in ASCII. However, ORing
with X'40' was the traditional way of uppercasing a character
algorithmically, which has predictably bad results when applied to
accented latin characters.

Contrary to common opinion, there is nothing in any mainframe
hardware or software that restricts the byte values in a string. The
reason for the commonly cited limit of 190 characters is that the
3270 terminal architecture uses codepoints below X'40' for control
codes, and these are not escaped or otherwise distinguished from
displayable characters in a data stream in most cases.

EBCDIC is a term about as loose as ASCII; there are EBCDIC code pages
that cover the same character repertoires as all the ISO 8859-n sets,
but there are also around 10 different EBCDIC mappings of 8859-1.
The historic reason for this is that early 3270-series terminals
supported an even smaller code space than the current 190 (96
characters, as I remember), so it was not possible to provide one
piece of hardware to support all latin-1 countries. Each country
therefore got its own version of EBCDIC. Once again, the 3270 is
mainly to blame.

Tony Harminc



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT