From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 11 2003 - 15:13:59 EDT
Peter Kirk asked:
> Thanks for the clarification. I probably misunderstood Jon's intention.
> But is there a problem if, for example, an application sees the string
> <space, space, combining mark> and regularises it (wrongly!) to <space,
> combining mark>?
Then you have a problem, of course.
What the Unicode Standard says about application of nonspacing
combining marks to SPACE seem clear to me.
What other standards say about space folding is clear in their
own contexts.
If someone is implementing both such standards together, then
one has to be careful how the requirements articulate.
In Unicode terms, a space folding is an example of a "knowing
modification" of the content of the text. It is perfectly o.k.
to modify Unicode text, of course, *as long as you know what
you are doing* -- i.e., you aren't converting valid text to
bit hash because you aren't conforming to the meaning of
the characters or to their encoding forms.
Now if a process is doing a space folding, but is applying
it to Unicode text as a "semi-ignorant modification", i.e.,
without being aware of the fact that nonspacing combining
marks can apply to SPACE characters (and that such sequences
are valid combining character sequences and should be treated
analogously with other grapheme clusters, viz UAX #29), then
it is modifying the text away from its intended content without
*knowing* what it is actually doing. Such mistakes are
programming errors in application of the relevant standards.
Of course a standard which mandates space folding is also
within its rights to mandate, for example, the non-use of
nonspacing marks applied to SPACE characters. It can simply
rule out such sequences as valid for its context, in which
case the problem goes away.
The important thing here is to know what you are doing when
you modify text, and, as far as possible, to accomplish
such modifications in ways that are the same as other
processes which also know what they are doing. That is the
basis for interoperability of textual data.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Aug 11 2003 - 15:50:59 EDT