From: Christopher Fynn (firstname.lastname@example.org)
Date: Fri Jan 07 2005 - 10:15:09 CST
Andrew C. West wrote:
> On Thu, 06 Jan 2005 19:16:38 +0000, Christopher Fynn wrote:
>> There seems to be one defect - the charts I've seen seem to contain
a pre-composed character equivalent to the combination U+0F68 U+0F7C
U+0F7E - It appears they've assumed that U+0F00 can be used as the
equivalent to that string. However in Unicode U+0F00 is *not* equivalent
to U+0F68 U+0F7C U+0F7E (U+0F00 has no de-composition). I think this
means that there would be no round-trip compatibility for this combination.
> I think that Chris meant to write "the charts I've seen *do not* seem
> a pre-composed character equivalent to the combination U+0F68 U+0F7C
Yes, thanks, that's what I intended. The *do not* have a pre-composed
character equivalent to *U+0F68 U+0F7C U+0F7E*
If you have a Unicode text which you are converting to GB18030 you more
or less have to convert "U+0F68 U+0F7C U+0F7E" to the GB18030 code which
maps to U+0F00 since "U+0F68 U+0F7C U+0F7E" is not going to display
properly on a system requiring pre-composed characters. U+0F00 in the
original Unicode text would map to the *same* GB18030 code.
If you get a BrdaRten/GB18030 encoded text to convert to Unicode Unicode
do you convert occurrences the character which maps to U+0F00
do you change it to U+0F68 U+0F7C U+0F7E or leave it as U+0F00?
Either way it seems to me there is a problem since the distinction
between U+0F68 U+0F7C U+0F7E & U+0F00 which is there in Unicode (since
U+0F00 has no decomposition) is lost in the process.
This archive was generated by hypermail 2.1.5 : Fri Jan 07 2005 - 10:20:16 CST