From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Sat Jan 08 2005 - 08:22:35 CST
On Thu, 6 Jan 2005 17:36:01 (CST), Mark Davis wrote:
>
> I agree with Ken's statement, but would qualify one bit.
>
>> to about March 31, 2005 will contain the mappings:
>>
>> FE90 <--> U+E854
>> 82359133 <--> U+9FBA
>>
>> After that time, they will contain the mappings:
>>
>> ???? <--> U+E854
>> FE90 <--> U+9FBA
>> 82359133 <--> ???? (probably U+FFFD)
>
>
>The http://www.unicode.org/reports/tr22/ recommends mapping tables of the
>following form to handle that situation, by changing the old cases into
>one-way mappings. This provides a more graceful transition.
>
>
> FE90 <-- U+E854
> FE90 <--> U+9FBA
> 82359133 --> U+9FBA
>
I'm sorry, but I just can't agree with this analysis.
At present a GB18030-Unicode mapping table includes the entries :
GB FE90 <--> U+E854
GB 82359133 <--> U+9FBA
GB 8338E335 <--> U+F300
A pan-GB18030 font will map :
FE90/U+E854 to a CJK ideograph glyph
82359133/U+9FBA to the notdef glyph
8338E335/U+F300 to the notdef glyph
Some time in the future the CJK ideograph represented at FE90/U+E854 may be
encoded at 82359133/U+9FBA, and 8338E335/U+F300 may be defined as the
precomposed Tibetan syllable I. If this happens the GB18030-Unicode mapping
table will still be :
GB FE90 <--> U+E854
GB 82359133 <--> U+9FBA
GB 8338E335 <--> U+F300
However, now a pan-GB18030 font should map :
FE90/U+E854 to the notdef glyph
82359133/U+9FBA to a CJK ideograph glyph
8338E335/U+F300 to a glyph corresponding to <U+0F68 U+0F72>
As far as I understand things the mappings between GB18030 and Unicode won't
change; what may change is what any particular GB18030 code point represents.
There will, however, be a mapping between different implicit versions of GB18030
when such changes in the GB18030 repertoire take place, so that, for example,
GB18030 version A FE90 = GB18030 version B 82359133. The mapping "FE90 <-->
U+9FBA" given by Ken and Mark is making an implicit conversion from GB18030
version A to GB18030 version B (i.e. FE90 --> 82359133 --> U+9FBA), which I do
not believe is appropriate in most circumstances.
Also, I think it would not be correct to state that 8338E335 should map to
<U+0F68 U+0F72> just because 8338E335 represents a precomposed Tibetan character
equivalent to <U+0F68 U+0F72>. I would say that the relationship between
8338E335 and <U+0F68 U+0F72> is more like a normalization mapping; that is to
say, 8338E335 maps to U+F300 for all versions of GB18030, but for "version B" of
GB18030 U+F300 may optionally be "normalized" to <U+0F68 U+0F72>.
Andrew
This archive was generated by hypermail 2.1.5 : Sat Jan 08 2005 - 08:26:53 CST