Re: Regulating PUA.

From: John H. Jenkins (
Date: Tue Jan 23 2007 - 15:07:22 CST

  • Next message: John H. Jenkins: "Re: Regulating PUA."

    On Jan 22, 2007, at 11:16 PM, wrote:

    > Unicode has consistently rejected using this approach of putting two
    > Chinese characters together to make a new one, and insists each new
    > CJKV character must be encoded, even though this would cut down the
    > number of codepionts required dramatically. Most Chinese characters
    > are in fact made in this way (over 80% if the one allows
    > combinations of combinations).

    Well, yes and no. Unicode's preference for a modern ad hoc or nonce
    character (such as my notorious frog-at-the-bottom-of-a-well
    character, or the nonce form found in Orson Scott Card's _Xenocide_)
    be represented with Ideographic Description Sequences.

    There is also a fair amount of consensus in the UTC that new
    simplified forms generated from encoded traditional forms should be
    represented using Variation Sequences and not explicit encoding. (We
    haven't entirely convinced the IRG of this point.)

    Unicode has rejected encoding of East Asian ideographs using a
    composition method for a number of reasons, some historical and some
    technical. Among the historical objections is the fact that none of
    the standards Unicode derived its core set of ideographs from used
    composition. Among the technical objections is the difficulty is
    defining equivalence for two composing character forms. (This is
    covered in TUS 5.0 in the section on IDSs.)

    The main objection is getting it to work in practice as part of text
    interchange and display. A simple technique like IDSs is good for
    interchange but rotten for display. A high-level technique like CDL
    is wonderful for display but clumsy for text interchange.

    In any event, owing to the productive nature of the script it is
    entirely possible to come up with an indefinitely large number of
    distinct sinograms in theory, in practice, the number in actual use is
    decidedly finite and well within the space limits of Unicode. If, at
    some point, it proves necessary to have more room than the standard
    currently allows, I have confidence that our great-grandchildren will
    be able to solve it.

    John H. Jenkins

    This archive was generated by hypermail 2.1.5 : Tue Jan 23 2007 - 15:08:47 CST