Re: ISO 10646 & GB18030 repetoire [was: Re: ISO 10646 compliance and EU law]

From: Christopher Fynn (
Date: Fri Jan 07 2005 - 08:53:54 CST

  • Next message: Christopher Fynn: "Re: GB18030 mapping"

    Andrew C. West wrote:

    > On Thu, 06 Jan 2005 19:16:38 +0000, Christopher Fynn wrote:
    >>There seems to be one defect - the charts I've seen seem to contain a
    >>pre-composed character equivalent to the combination U+0F68 U+0F7C
    >>U+0F7E - It appears they've assumed that U+0F00 can be used as the
    >>equivalent to that string. However in Unicode U+0F00 is *not* equivalent
    >>to U+0F68 U+0F7C U+0F7E (U+0F00 has no de-composition). I think this
    >>means that there would be no round-trip compatibility for this combination.
    > I think that Chris meant to write "the charts I've seen *do not* seem to contain
    > a pre-composed character equivalent to the combination U+0F68 U+0F7C U+0F7E".
    > Andrew


    Yes, thanks, that's what I intended. The *do not* have a pre-composed
    character equivalent to U+0F7E

    If you have a Unicode text which you are converting to GB18030 you more
    or less have to convert "U+0F68 U+0F7C U+0F7E" to the GB18030 code which
    maps to U+0F00 as "U+0F68 U+0F7C U+0F7E" is not going to display
    properly on a system requiring pre-composed characters. U+0F00 in the
    original Unicode text would map to the *same* GB18030 code.

    If you get a BrdaRten/GB18030 encoded text to convert to Unicode
    Unicode do you convert occurrences the character which maps to U+0F00
    do you change it to U+0F68 U+0F7C U+0F7E or leave it as U+0F00?

    Either way it seems to me there is a problem since the distinction
    between U+0F68 U+0F7C U+0F7E & U+0F00 which is there in Unicode (since
    U+0F00 has no decomposition) is lost in the process.


    - Chris

    This archive was generated by hypermail 2.1.5 : Fri Jan 07 2005 - 09:01:27 CST