Re: Regulating PUA.

Date: Tue Jan 23 2007 - 18:56:43 CST

  • Next message: "Re: Regulating PUA."

    I agree at the speed the IRG works unicode has enough space in our
    lifetime, though our grandchildren may think some of the decisions we
    made as very silly.

    Quoting "John H. Jenkins" <>:

    > On Jan 22, 2007, at 11:16 PM, wrote:
    >> Unicode has consistently rejected using this approach of putting
    >> two Chinese characters together to make a new one, and insists each
    >> new CJKV character must be encoded, even though this would cut
    >> down the number of codepionts required dramatically. Most Chinese
    >> characters are in fact made in this way (over 80% if the one allows
    >> combinations of combinations).
    > Well, yes and no. Unicode's preference for a modern ad hoc or nonce
    > character (such as my notorious frog-at-the-bottom-of-a-well character,
    > or the nonce form found in Orson Scott Card's _Xenocide_) be
    > represented with Ideographic Description Sequences.

    It might be better to use the general rules for characters encode
    those which are widespread and use PUA for those which are not.

    > There is also a fair amount of consensus in the UTC that new simplified
    > forms generated from encoded traditional forms should be represented
    > using Variation Sequences and not explicit encoding. (We haven't
    > entirely convinced the IRG of this point.)

    And with good reason, this would omly be valid if the Variation
    Sequences for cjkv characters where ones that had to be represented,
    not ignorable. The rule of widespread usage would again be good here.
    If it is used widely it should be seperately encoded -- there are many
    times more non-widely used simplified forms and variants that almost
    everyone agrees, and most importantly the IRG agrees do do not at
    present need there own code points. (Please see the attached pdf
    showing an "obvious" example, with the character for horse).

    Creating a PUA for Variation Sequences would probablely help encourage
    peole to use variation sequences for ckjv.

    > Unicode has rejected encoding of East Asian ideographs using a
    > composition method for a number of reasons, some historical and some
    > technical. Among the historical objections is the fact that none of
    > the standards Unicode derived its core set of ideographs from used
    > composition. Among the technical objections is the difficulty is
    > defining equivalence for two composing character forms. (This is
    > covered in TUS 5.0 in the section on IDSs.)
    > The main objection is getting it to work in practice as part of text
    > interchange and display. A simple technique like IDSs is good for
    > interchange but rotten for display. A high-level technique like CDL is
    > wonderful for display but clumsy for text interchange.
    > In any event, owing to the productive nature of the script it is
    > entirely possible to come up with an indefinitely large number of
    > distinct sinograms in theory, in practice, the number in actual use is
    > decidedly finite and well within the space limits of Unicode. If, at
    > some point, it proves necessary to have more room than the standard
    > currently allows, I have confidence that our great-grandchildren will
    > be able to solve it.
    > ========
    > John H. Jenkins

    This message sent through Virus Free Email

    This archive was generated by hypermail 2.1.5 : Tue Jan 23 2007 - 18:58:10 CST