Re: Looking For Information

From: Timothy Partridge (timpart@perdix.demon.co.uk)
Date: Wed Jun 28 2000 - 14:40:03 EDT


Harry Aufderheide recently said:

> I work for a large global firm in the transportation industry and we are
> taking a high-level look of our future business requirements for and the
> I.S. effort to properly handle all the characters of all the languages
> currently in use on the planet earth.
>
> I have some specific questions but am interested in hearing anything related
> to work effort required ,issues, concerns, etc. First some background.
>
> Our operating environment includes many IBM mainframes (multiple locations),
> AS/400s, UNIX platforms, various handheld data collection devices, and a
> large number of Windows NT clients and servers. Our applications run the
> gamut including data collection, customer focus internet, marketing, sales,
> financials, package tracking, billing,.... you name it we probably have it
> somewhere. Data for the most part is stored centrally on the IBM mainframes.
> Our programming languages also run the gamut including COBOL, C, C++, HTML,
> etc.
>
> We truly have an international presence but currently only receive data in
> English, French, Italian, German, and Spanish and, at least, some characters
> in other single byte languages. We are experiencing limited difficulties in
> properly handling all the single byte characters received. My belief is that
> this is due to program language character definition, code page, and
> EBCIDIC/ASCII differences on the various platforms. We are now "putting out
> fires" while looking for a better single byte solution and future double
> byte requirements.
>
>
> Based on everything that I have read the UNICODE standard is the way to go;
> hence my questions.
>
> 1. Is the UTF-8's character set equal to the Latin-1 (ASCII) Code Page's? If
> not, what are the differences?
> Under the assumption that it is substantially the same; I don't see
> it solving our problems
> as we are currently processing more characters than this can
> support. It certainly doesn't
> appear a solution for handling Chinese, Japanese, etc.
>
> This leads me to the UTF-16 format with its double byte capability.
>
> 2. I have read a good deal of material on support of UNICODE (UTF-x)on many
> platforms but have
> not found much about the mainframe (EBCIDIC) environment other than DB2
> support for UNICODE.
> Assuming that we will have the need to process characters that require
> double byte technology
> and assuming that we have already done a good job of internationalizing
> our applications

I have an interest in this sort of information too.

The first question may be which versions of DB2 are in use.
I think DB2 OS/400 supports CCSID 13488 UCS-2 Level 1 (UCS-2 is UTF-16
restricted to plane zero. It might manage UTF-16 too without too much effort.)
I'm not sure whether DB2 on other platforms spports this CCSID.

UTF-16 is a character set that uses two bytes, but I don't think that
is quite the same as an IBM double byte character set (DBCS).

I'm know very little about IBM DBCS, but the impression I have is that
there are Shift In and Out control characters that swap between
single and double byte modes.

UTF-16 is modeless and is always two bytes.

Could an IBMer shed light on the following:
Do IBM DBCS strings assume starting in single byte mode?
And would the presence of certain bytes in UTF-16 trigger a switch from
double to single byte mode?

IBM have defined UTF-EBCDIC. (Details available as a technical report on
www.unicode.org) This converts Unicode characters into a variable number of
bytes in a similar way that UTF-8 does. The basic letters A-Z and digits 0-9
are mapped to their corresponding EBCDIC codes. This means that when these
particular characters are stored on an EBCDIC platform they are readable in
that format. Other characters are mapped to sequences of non-control codes.
This allows them to be shown on a terminal as wierd looking sequences of
characters, but ones which won't send any wierd control codes to the
terminal.

Although UTF-EBCDIC exists I have not seen much sign of support for it.
For example, is it possible to print UTF-EBCDIC on a mainframe printer?
Can any terminals show it? (Or terminal emulators on PCs.)

At the moment UTF-EBCDIC seems to be of most use if you want to use the
mainframe as a database server and translate into UTF-16 or UTF-8 when
talking to the outside world. (A simple translation program would be
needed.)

> I see the need, across all platforms, for:
>
> - redesigning many of our files
Extra length may be needed for some fields.

> - making program changes specific to these physical changes (file
> layouts, working storage,
> user interfaces)
> - modifying all logic operating on text (string) data

Sorting and string comparison can be complex (this is due to the complexities
of people's sorting needs, not anything inherent in Unicode.)

Regards,

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT