Re: TC/SC mapping

From: Werner LEMBERG (wl@gnu.org)
Date: Thu Jan 24 2002 - 03:08:22 EST


> > This is the kind of mess that has discouraged anybody from doing a
> > systematic survey of simplifications for the Unihan database.
>
> Part of this is because there is the orthogonal complexity of
> variant TC forms. Before converting TC to SC, one should resolve
> all TC variants to the most "common" or "standard" TC form (good
> luck deciding what that means). e.g., in the above case, resolve to
> U+9EBD.

I think that any mapping will fail. As so many things with CJK
characters, the usage depends on constraints beyond a character
encoding: time, location, purpose, etc. This is the very reason why
CCCII hasn't succeeded. As a consequence, the available fields are
not enough to really represent the interdependencies correctly.

Either increase the number of available keywords (e.g. kZVariant1,
kZVariant2) to be able to fine-tune the dependencies (something like
`character a in the meaning of b is a variant of character c', or add
a remark to the description of keywords that the fields can't be
exhaustive due to such and such reasons.

    Werner



This archive was generated by hypermail 2.1.2 : Thu Jan 24 2002 - 04:59:39 EST