RE: Canonical equivalence in rendering: mandatory or recommended?

From: Peter Constable (
Date: Thu Oct 16 2003 - 13:38:58 CST

> -----Original Message-----
> From: []
> Behalf Of Asmus Freytag

> >>Canonical equivalence must be taken into account in rendering
> >>accents, so that any two canonically equivalent sequences display as
> same.
> This statement goes to the core of Unicode. If it is followed, it
> guarantees that normalizing a string does not change its appearance
> therefore it remains the 'same' string as far as the user is

I agree in principle. There are two ways in which the philosophy behind
this breaks down in real life, though:

1. There are cases of combining marks given a class of 0, meaning that
combinations of marks in different positions relative to the base will
be visually indistinguishable, but the encoded representations are not
the same, and not canonically equivalent. E.g. (taken from someone else
on the Indic list) Devanagari ka + i + u vs. ka + u + i.

2. Relying on normalization, and specifically canonical ordering, to
happen in a rendering engine IS liable to be a noticeable performance
issue. I suggest that whoever wrote

> Rendering systems should handle any of the canonically equivalent
> orders of combining marks. This is not a performance issue: The amount

> of time necessary to reorder combining marks is insignificant compared

> to the time necessary to carry out other work required for rendering.

was not speaking from experience.

> The interesting digressions on string libraries aside, the statement
> here is in the context of the tasks needed for rendering. If you take
> rendering library and add a normalization pass on the front of it,
> be hard-pressed to notice a difference in performance, especially for
> complex scripts.

If what is normalized is the backing store. If what is normalized is a
string at an intermediate stage in the rendering process, then this is
not the case. The reason is the number of times text-rendering APIs get
called. As you mention,

> However, from the other messages on this thread we conclude:
> *every* string, *every time* it gets touched, *is* a performance

Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST