Re: Regulating PUA.

From: John H. Jenkins (jenkins@apple.com)
Date: Tue Jan 23 2007 - 15:07:22 CST

Next message: John H. Jenkins: "Re: Regulating PUA."

Previous message: Frank Ellermann: "Re: Proposing UTF-21/24"
In reply to: vunzndi@vfemail.net: "Re: Regulating PUA."
Next in thread: vunzndi@vfemail.net: "Re: Regulating PUA."
Reply: vunzndi@vfemail.net: "Re: Regulating PUA."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Jan 22, 2007, at 11:16 PM, vunzndi@vfemail.net wrote:

>
> Unicode has consistently rejected using this approach of putting two
> Chinese characters together to make a new one, and insists each new
> CJKV character must be encoded, even though this would cut down the
> number of codepionts required dramatically. Most Chinese characters
> are in fact made in this way (over 80% if the one allows
> combinations of combinations).
>

Well, yes and no. Unicode's preference for a modern ad hoc or nonce
character (such as my notorious frog-at-the-bottom-of-a-well
character, or the nonce form found in Orson Scott Card's _Xenocide_)
be represented with Ideographic Description Sequences.

There is also a fair amount of consensus in the UTC that new
simplified forms generated from encoded traditional forms should be
represented using Variation Sequences and not explicit encoding. (We
haven't entirely convinced the IRG of this point.)

Unicode has rejected encoding of East Asian ideographs using a
composition method for a number of reasons, some historical and some
technical. Among the historical objections is the fact that none of
the standards Unicode derived its core set of ideographs from used
composition. Among the technical objections is the difficulty is
defining equivalence for two composing character forms. (This is
covered in TUS 5.0 in the section on IDSs.)

The main objection is getting it to work in practice as part of text
interchange and display. A simple technique like IDSs is good for
interchange but rotten for display. A high-level technique like CDL
is wonderful for display but clumsy for text interchange.

In any event, owing to the productive nature of the script it is
entirely possible to come up with an indefinitely large number of
distinct sinograms in theory, in practice, the number in actual use is
decidedly finite and well within the space limits of Unicode. If, at
some point, it proves necessary to have more room than the standard
currently allows, I have confidence that our great-grandchildren will
be able to solve it.

========
John H. Jenkins
jenkins@apple.com
jhjenkins@mac.com
http://homepage.mac.com/jhjenkins/

Next message: John H. Jenkins: "Re: Regulating PUA."
Previous message: Frank Ellermann: "Re: Proposing UTF-21/24"
In reply to: vunzndi@vfemail.net: "Re: Regulating PUA."
Next in thread: vunzndi@vfemail.net: "Re: Regulating PUA."
Reply: vunzndi@vfemail.net: "Re: Regulating PUA."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jan 23 2007 - 15:08:47 CST