RE: Unicode conformant character encodings and us-ascii

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Thu May 15 2003 - 10:19:19 EDT

  • Next message: Marco Cimarosti: "RE: how to sort by stroke (not radical/stroke)"

    Yael Aharon wrote:
    > I see now why you thought the question was odd. I actually
    > meant to ask about the various iso (e.g. 8859 variants) and
    > windows character encodings.

    OK, but those encodings do not "conform to Unicode specs": they are simply
    different encodings, which can be *converted* to Unicode because Unicode
    contains all the characters that they contain.

    However, the answer to your question is "yes" for all ISO 8859 and Windows
    encoding. However, it is "no" for most DOS encodings (which are still
    sometimes used in Windows) and for some Japanese encodings (also used in
    Windows in, e.g., Internet or e-mail).

    You can check this from the mapping files found here:

            http://www.unicode.org/Public/MAPPINGS

    Each line in those files contains the mapping between a 3rd-party encoding
    character (1st column) and Unicode (2nd column):

            ...
            0x41 0x0041 # LATIN CAPITAL LETTER A
            ...
            0xC7 0x0627 # ARABIC LETTER ALEF
            ...

    You could do a quick script to check whether any 3rd-party character in
    range 0x00 to 0x7F maps to a different Unicode value.

    _ Marco



    This archive was generated by hypermail 2.1.5 : Thu May 15 2003 - 11:08:52 EDT