"tracey kelly" <firstname.lastname@example.org> wrote
> Any advice or feedback from anyone who has done anything similar
> would be appreciated. How would the unicode look stored in EBCDIC?
> for example, code point 006D for 'n' - stored as character '00D6'
> or hex x'006D'? What about the 'U' - or does one HAVE to use one of
> the UTFs?
I presume you receive the Chinese data from a number of sources such as PC
data and S390 native data. This will mean converting from codepage 936
(Simplified Chinese) or 950 (Traditional Chinese) on the PC to Unicode, and
also converting from codepage 935 (Simplified) or 937 (Traditional) EBCDIC
to Unicode. Once the data are encoded as Unicode, they're Unicode - the
fact that the mainframe is EBCDIC-based is irrelevant since the Unicode
data are now binary values. Using your example - 'n' will now be stored as
0x006E (6D is 'm'), just as the Unicode Standard defines.
By the way, there's a UTF format (called UTF-8-EBCDIC?) that was designed
to make this sort of thing easier, although you may find that simple 16-bit
Unicode is your best choice.
A word of warning - there will be a substantial number of new Chinese
characters in the next rev of Unicode, so you'll need to support
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT