Re: ISO 10646 & GB18030 repertoire

From: Christopher Fynn (cfynn@gmx.net)
Date: Fri Jan 07 2005 - 10:15:09 CST

  • Next message: Peter Kirk: "Re: BrdaRten precomposed Tibetan character set (was Re: ISO 10646 compliance and EU law )"

    Andrew C. West wrote:

    > On Thu, 06 Jan 2005 19:16:38 +0000, Christopher Fynn wrote:
    >
    >> There seems to be one defect - the charts I've seen seem to contain
    a pre-composed character equivalent to the combination U+0F68 U+0F7C
    U+0F7E - It appears they've assumed that U+0F00 can be used as the
    equivalent to that string. However in Unicode U+0F00 is *not* equivalent
    to U+0F68 U+0F7C U+0F7E (U+0F00 has no de-composition). I think this
    means that there would be no round-trip compatibility for this combination.
    >
    >
    >
    > I think that Chris meant to write "the charts I've seen *do not* seem
    to contain
    > a pre-composed character equivalent to the combination U+0F68 U+0F7C
    U+0F7E".
    >
    > Andrew

    Andrew

    Yes, thanks, that's what I intended. The *do not* have a pre-composed
    character equivalent to *U+0F68 U+0F7C U+0F7E*

    If you have a Unicode text which you are converting to GB18030 you more
    or less have to convert "U+0F68 U+0F7C U+0F7E" to the GB18030 code which
    maps to U+0F00 since "U+0F68 U+0F7C U+0F7E" is not going to display
    properly on a system requiring pre-composed characters. U+0F00 in the
    original Unicode text would map to the *same* GB18030 code.

    If you get a BrdaRten/GB18030 encoded text to convert to Unicode Unicode
    do you convert occurrences the character which maps to U+0F00
    do you change it to U+0F68 U+0F7C U+0F7E or leave it as U+0F00?

    Either way it seems to me there is a problem since the distinction
    between U+0F68 U+0F7C U+0F7E & U+0F00 which is there in Unicode (since
    U+0F00 has no decomposition) is lost in the process.

    regards

    - Chris



    This archive was generated by hypermail 2.1.5 : Fri Jan 07 2005 - 10:20:16 CST