RE: test data for code page IBM 837 (simpl. chinese)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Apr 17 2003 - 18:15:14 EDT

  • Next message: Markus Scherer: "Re: test data for code page IBM 837 (simpl. chinese)"

    IBM Code Page 837 is the DBCS portion of the Host Simplified
    Chinese CCSID's. It defines all the wide characters.

    You don't actually *use* Code Page 837 by itself. It is used,
    together with Code Page 836 to define the IBM Host merged
    code page: IBM Code Page 935. Code Page 836 is the SBCS
    portion for Simplified Chinese: basically (in EBCDIC), the
    ASCII repertoire plus the yen (yuan) sign, the pound (currency)
    sign, and the broken bar.

    So if you test against the ICU Code Page 935 mapping (or anybody
    else's implementation of Code Page 935), you will pick up
    *all* of the Chinese characters for the DBCS portion (Code Page 837).

    > Partially answering my own question,
    > ICU says (on this page:
    http://www-124.ibm.com/icu/charset/roundtripIndex.html#ibm-837_X100-1995)
    > that IBM 837 is a bit more than 98% similar to IBM 935.
    >
    > The ICU .ucm file for 837 doesn't exist (as far as I can tell
    > from looking at the file names in the ICU 'mappings' directory.
    > So, would it be safe to conclude that the ICU file
    > ibm-935_P110-1999.ucm is 98% of what I need?
    > Again, I'm looking for a list of all characters in IBM 837.

    If you want that list explicitly, just grab all the double-byte
    characters out of the ICU mapping for IBM 935.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Apr 17 2003 - 19:03:11 EDT