From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Aug 05 2004 - 13:11:55 CDT
In TUS 4.0 Section 5.3, p.111, the following is stated of default 
ignorable code points:
> These characters are also ignored except with respect to specific, 
> defined processes; for example, ZERO WIDTH NON-JOINER is ignored in 
> collation. ... For more information, see Section 5.20, Default 
> Ignorable Code Points.
But in Section 5.20, although there is a lot about rendering default 
ignorable code points, there is no further information about any other 
processing of them. The implication of that section seems to be that 
these characters are intended to be ignored in rendering but not in 
other processes such as collation. Is this or the summary in Section 5.3 
in fact to be taken as the intention of the standard? Has the summary 
simply not been updated for consistency with the fuller details? Or has 
the fuller description been unintentionally restricted to rendering?
Is it in fact the intention that all default ignorable characters must 
always be ignored in collation? Or is it possible to tailor collation 
not to ignore them? The collation algorithm seems to suggest the latter, 
in that there seems to be no mention of these characters being 
obligatorily ignored - although I presume they have zero weight by 
default (in DUCET).
This has some quite serious implication for processing of texts 
including ZW(N)J, variation selectors etc.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Thu Aug 05 2004 - 13:13:35 CDT