Re: GBK, HZ and EUC-TW - Unicode round-tripping policy

From: Tom Emerson (
Date: Thu Jan 11 2001 - 09:43:28 EST

Michael (michka) Kaplan writes:
> As for (example) the case where there are two Euros that are the same, it is
> simple to simply choose one of them and always map it.

But then you loose round trip behavior, which is necessary in some
applications. In cases like this I (and others, e.g., Microsoft) map
one of the ambiguous code-points to the PUA: which allows you to round
trip internally.

Of course if you are unconcerned with maintaining round-trip behavior
(e.g., you just want to convert the text to Unicode so you can
display/edit it), then you map both legacy code points to the same
Unicode codepoint and be done with it.


Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                  
  "Beware the lollipop of mediocrity: lick it once and you suck forever"

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT