Re: Canonical equivalence in rendering: mandatory or recommended?

From: Asmus Freytag (
Date: Thu Oct 16 2003 - 11:56:32 CST

At 02:26 AM 10/16/03 -0700, Peter Kirk wrote:
>>You can never tell whether something is going to be a "performance
>>issue" -- not just "measurably slower," but actually affecting
>>usability -- until you do some profiling. Guessing does no good.
>Well, did the people who wrote this in the standard do some profiling, or
>did they just guess? There should be no place in a standard for statements
>which are just guesses.

Oh don't we just love making categorical statements today.

Scripts where the issue is expected to actually matter include Arabic and
Hebrew. Both those scripts require the Bidi algorithm to be run (in
addition to all the other rendering related tasks). There are two phases to
that algorithm: level assignment and reversal. Assigning levels is a linear
process, but reversal depends on both the input size and the number of
levels. So, it's essentially equivalent to an O(Nxm) where m is a not quite
constant, but small number.

Arabic would need positional shaping in addition to the bid algorithm.

Normalization has mapping and reordering phases. The reordering is O(n
log(n)) where n is the length of a combining sequence. Realistically that's
a small number. The rest of the algorithm is O(N) with N the length of the
input. For NFC there's a decomposition and the composition phase, so the
number of steps per character is not as trivial as a strcpy, but then
again, neither is bidi.

The rest of rendering has to map characters to glyphs, add glyph extents,
calculate line breaks, determine glyph positions, and finally rasterize
outlines and copy bits. (When rendering to PDF, the last two steps would be
slightly different). That's a whole lot of passes over the data as well,
many of them with a non-trivial number of steps per input character.

Given this context, it's more than an educated guess that normalization at
rendering time will not dominate the performance, particularly not when

Even for pure ASCII data (which never need normalization), the rendering &
display tasks will take more steps per character than a normalization quick
check (especially one opitmized for ASCII input ;-).

Therefore, I regard the statement in the text of the standard as quite
defensible (if understood in context) and to be better supported than a
mere 'guess'. It's a well-educated guess, probably even a PhD. guess.

However, if someone has measurements from a well-tuned system, it would be
nice to know some realistic values for the relative cost of normalization
and display.


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST