Looking For Information

From: AUFDERHEIDE HARRY R. (app1hra) (app1hra@ups.com)
Date: Tue Jun 27 2000 - 11:56:16 EDT

I work for a large global firm in the transportation industry and we are
taking a high-level look of our future business requirements for and the
I.S. effort to properly handle all the characters of all the languages
currently in use on the planet earth.

I have some specific questions but am interested in hearing anything related
to work effort required ,issues, concerns, etc. First some background.

Our operating environment includes many IBM mainframes (multiple locations),
AS/400s, UNIX platforms, various handheld data collection devices, and a
large number of Windows NT clients and servers. Our applications run the
gamut including data collection, customer focus internet, marketing, sales,
financials, package tracking, billing,.... you name it we probably have it
somewhere. Data for the most part is stored centrally on the IBM mainframes.
Our programming languages also run the gamut including COBOL, C, C++, HTML,

We truly have an international presence but currently only receive data in
English, French, Italian, German, and Spanish and, at least, some characters
in other single byte languages. We are experiencing limited difficulties in
properly handling all the single byte characters received. My belief is that
this is due to program language character definition, code page, and
EBCIDIC/ASCII differences on the various platforms. We are now "putting out
fires" while looking for a better single byte solution and future double
byte requirements.

Based on everything that I have read the UNICODE standard is the way to go;
hence my questions.

1. Is the UTF-8's character set equal to the Latin-1 (ASCII) Code Page's? If
not, what are the differences?
        Under the assumption that it is substantially the same; I don't see
it solving our problems
        as we are currently processing more characters than this can
support. It certainly doesn't
        appear a solution for handling Chinese, Japanese, etc.
        This leads me to the UTF-16 format with its double byte capability.

2. I have read a good deal of material on support of UNICODE (UTF-x)on many
platforms but have
   not found much about the mainframe (EBCIDIC) environment other than DB2
support for UNICODE.
   Assuming that we will have the need to process characters that require
double byte technology
   and assuming that we have already done a good job of internationalizing
our applications
   I see the need, across all platforms, for:

        - redesigning many of our files
        - making program changes specific to these physical changes (file
layouts, working storage,
        user interfaces)
        - modifying all logic operating on text (string) data

        Does COBOL support UTF-16; How and where can I find information?
What about "C" languages?

        What else should we be aware of?

        Your thoughts would be greatly appreciated.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT