I work for a large global firm in the transportation industry and we are
taking a high-level look of our future business requirements for and the
I.S. effort to properly handle all the characters of all the languages
currently in use on the planet earth.
I have some specific questions but am interested in hearing anything related
to work effort required ,issues, concerns, etc. First some background.
Our operating environment includes many IBM mainframes (multiple locations),
AS/400s, UNIX platforms, various handheld data collection devices, and a
large number of Windows NT clients and servers. Our applications run the
gamut including data collection, customer focus internet, marketing, sales,
financials, package tracking, billing,.... you name it we probably have it
somewhere. Data for the most part is stored centrally on the IBM mainframes.
Our programming languages also run the gamut including COBOL, C, C++, HTML,
We truly have an international presence but currently only receive data in
English, French, Italian, German, and Spanish and, at least, some characters
in other single byte languages. We are experiencing limited difficulties in
properly handling all the single byte characters received. My belief is that
this is due to program language character definition, code page, and
EBCIDIC/ASCII differences on the various platforms. We are now "putting out
fires" while looking for a better single byte solution and future double
Based on everything that I have read the UNICODE standard is the way to go;
hence my questions.
1. Is the UTF-8's character set equal to the Latin-1 (ASCII) Code Page's? If
not, what are the differences?
Under the assumption that it is substantially the same; I don't see
it solving our problems
as we are currently processing more characters than this can
support. It certainly doesn't
appear a solution for handling Chinese, Japanese, etc.
This leads me to the UTF-16 format with its double byte capability.
2. I have read a good deal of material on support of UNICODE (UTF-x)on many
platforms but have
not found much about the mainframe (EBCIDIC) environment other than DB2
support for UNICODE.
Assuming that we will have the need to process characters that require
double byte technology
and assuming that we have already done a good job of internationalizing
I see the need, across all platforms, for:
- redesigning many of our files
- making program changes specific to these physical changes (file
layouts, working storage,
- modifying all logic operating on text (string) data
Does COBOL support UTF-16; How and where can I find information?
What about "C" languages?
What else should we be aware of?
Your thoughts would be greatly appreciated.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT