Re: Canonical equivalence in rendering: mandatory or recommended?

From: Peter Kirk (
Date: Wed Oct 15 2003 - 14:44:20 CST

On 15/10/2003 10:48, Asmus Freytag wrote:

> I'm going to answer some of Peter's points, leaving aside the
> interesting digressions into Java subclassing etc. that have developed
> later in the discussion.

Thank you, Asmus. If people want to discuss normalisation and string
handling in Java, they are welcome to do so, but they should use a
different subject heading and not my (copyrighted :-) ) text.

> At 04:19 AM 10/15/03 -0700, Peter Kirk wrote:
>> I note the following text from section 5.13, p.127, of the Unicode
>> standard v.4:
>>> Canonical equivalence must be taken into account in rendering
>>> multiple accents, so that any two canonically equivalent sequences
>>> display as the same.
> This statement goes to the core of Unicode. If it is followed, it
> guarantees that normalizing a string does not change its appearance
> (and therefore it remains the 'same' string as far as the user is
> concerned.)
> ...
> The guidelines are concerned with the average case: displaying the
> characters as *text*.
> [The use of the word 'must' in a guideline is always awkward, since
> that word has such a strong meaning in the normative part of the
> standard.]

So, are you saying that for normal display of characters as text these
guidelines must be followed?

>>> Rendering systems should handle any of the canonically equivalent
>>> orders of combining
>>> marks. This is not a performance issue: The amount of time necessary
>>> to reorder combining
>>> marks is insignificant compared to the time necessary to carry out
>>> other work required
>>> for rendering.
> The interesting digressions on string libraries aside, the statement
> made here is in the context of the tasks needed for rendering. If you
> take a rendering library and add a normalization pass on the front of
> it, you'll be hard-pressed to notice a difference in performance,
> especially for any complex scripts.
> So we conclude: "rendering any string as if it was normalized" is
> *not* a performance issue.

Thank you. This is the clarification I was looking for, and confirms my
own suspicions. But are there any other views on this? I have heard
them from implementers of rendering systems. But I have wondered if this
is because of their reluctance to do the extra work required to conform
to this requirement.

Peter Kirk (personal) (work)

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST