Re: An attempt to focus the PUA discussion [long]

From: Peter Kirk (
Date: Sat May 01 2004 - 08:57:30 CST

On 29/04/2004 16:56, Kenneth Whistler wrote:

>Peter Kirk wrote, in response to Ernest Cline:
>>>... It simply is impossible
>>>to simulate non-zero canonical combining class characters in Unicode
>>>with anything other than a character with the appropriate canonical
>>>combining class. ...
>>True. But fortunately Unicode don't really need to worry about
>>normalisation of PUA data, as this is surely out of its scope.
>Not quite. PUA code points are subject to the Unicode normalization
>algorithm, as well as any other. Their behavior in NFC or NFD,
>for example, is rigidly defined, if trivial: a PUA code point
>normalizes to itself.

Indeed. Perhaps I should have referred to any transformations of PUA
data for normalisation. Unicode rightly does not transform it.

I was actually thinking more of logical normalisation, i.e. that it is
not up to Unicode to decide whether <ELMTREE SYMBOL, COMBINING CHIPMUNK,
COMBINING SQUIRREL> is semantically equivalent to <ELMTREE SYMBOL,
COMBINING SQUIRREL, COMBINING CHIPMUNK> or, if they are, to provide a
mechanism whereby one of these is normalised to the other. If in fact
they are equivalent (e.g. the squirrel is on the ground, but the
chipmunk is in the tree), then it is up to the PUA user to ensure that
the data is ordered consistently or to provide private non-standard
ordering mechanisms. Do you agree? If this is true, then there is no
point in allocating the combining PUA characters to any class other than

Peter Kirk (personal) (work)

This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT