Re: Merging combining classes, was: New contribution N2676

From: Peter Kirk (
Date: Wed Oct 29 2003 - 16:48:20 CST

On 29/10/2003 14:14, John Cowan wrote:

>Peter Kirk scripsit:
>>Is this actually a conformance requirement? I thought I understood the
>>following: A rendering engine which fails to render canonical
>>equivalents identically, or fails to render certain orders sensibly, is
>>not doing what the Unicode standard tells it that it must do. But it is
>>not technically non-conformant because the statement that it must render
>>canonical equivalents identically is not in a conformance clause. This
>>implies that software producers who produce rendering engines which are
>>deficient in this way can still claim conformance to Unicode. This is an
>>ambiguity which, in my opinion, should be resolved in a future edition
>>of the standard.
>C9 says:
>A process shall not assume that the interpretations of two canonical-equivalent
>character sequences are distinct.
Yes, but this doesn't quite say that it must treat them as identical, as
is clear from the following explanatory notes:

> Ideally, an implementation would always interpret two
> canonical-equivalent character sequences identically. There are
> practical circumstances under which implementations may reasonably
> distinguish them.

> Even processes that normally do not distinguish between
> canonical-equivalent character sequences can have reasonable exception
> behavior. Some examples of this behavior include graceful fallback
> processing by processes unable to support correct positioning of
> nonspacing marks;...

So a process "unable to support correct positioning of nonspacing marks"
is not obliged to give the same incorrect positioning to a set of marks
regardless of order. But I am not sure that this get-out clause should
be applicable to a process which claims as its very essence "to support
correct positioning of nonspacing marks" but actually supports only a
particular arbitrary (non even canonical) order.

I would like to see this clause tightened up to say that a process which
claims to interpret properly a particular sequence of marks must
interpret all canonically equivalent variants of that sequence
identically, with the exception of special modes to show the underlying
character sequence.

Arguably conformance clause C7 in fact states this, on the basis that
canonical equivalence is a part of character semantics:

> C7 A process shall interpret a coded character representation
> according to the character
> semantics established by this standard, if that process does interpret
> that coded character
> representation.

This clause, and clause C9 with its corollary "no process can assume
that another process will make a distinction between two different, but
canonical-equivalent character sequences", also preclude any process
from assuming that data presented to it is already normalised. It must
interpret a non-normalised variant in the same way as the normalised
form; and it cannot assume that the process presenting the data makes a
distinction between the normalised and non-normalised form and does not
reorder the data into an arbitrary canonically equivalent form. This
renders superfluous any guarantees of the stability of normalisation,
for processes which require normalised data must perform their own
normalisation each time they read data.

Peter Kirk (personal) (work)

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST