Re: Canonical equivalence in rendering: mandatory or recommended?

From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Oct 16 2003 - 16:29:44 CST

Next message: Michael Everson: "Re: Beyond 17 planes, was: Java char and Unicode 3.0+"
Previous message: Philippe Verdy: "Re: Beyond 17 planes, was: Java char and Unicode 3.0+"
In reply to: Peter Constable: "RE: Canonical equivalence in rendering: mandatory or recommended?"
Next in thread: Peter Constable: "RE: Canonical equivalence in rendering: mandatory or recommended?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 16/10/2003 12:38, Peter Constable wrote:

>>-----Original Message-----
>>From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
>>
>>
>On
>
>
>>Behalf Of Asmus Freytag
>>
>>
>
>
>
>
>>>>Canonical equivalence must be taken into account in rendering
>>>>
>>>>
>multiple
>
>
>>>>accents, so that any two canonically equivalent sequences display as
>>>>
>>>>
>the
>
>
>>same.
>>
>>This statement goes to the core of Unicode. If it is followed, it
>>guarantees that normalizing a string does not change its appearance
>>
>>
>(and
>
>
>>therefore it remains the 'same' string as far as the user is
>>
>>
>concerned.)
>
>I agree in principle. There are two ways in which the philosophy behind
>this breaks down in real life, though:
>
>1. There are cases of combining marks given a class of 0, meaning that
>combinations of marks in different positions relative to the base will
>be visually indistinguishable, but the encoded representations are not
>the same, and not canonically equivalent. E.g. (taken from someone else
>on the Indic list) Devanagari ka + i + u vs. ka + u + i.
>
>
As we are talking about rendering rather than operations on the backing
store, this is actually irrelevant. If two sequences are visually
indistinguishable (with the particular font in use), a rendering engine
can safely map them together even if they are not canonically
equivalent, as long as the backing store is unchanged.

>2. Relying on normalization, and specifically canonical ordering, to
>happen in a rendering engine IS liable to be a noticeable performance
>issue. I suggest that whoever wrote
>
>
>
>>Rendering systems should handle any of the canonically equivalent
>>orders of combining marks. This is not a performance issue: The amount
>>
>>
>>of time necessary to reorder combining marks is insignificant compared
>>
>>
>>to the time necessary to carry out other work required for rendering.
>>
>>
>
>was not speaking from experience.
>
>
>
I wonder if anyone involved in this is speaking from real experience.
Peter, I don't think your old company actually tried to implement such
reordering; Sharon tells me that the idea was suggested, but rejected
for reasons unrelated to performance. I have heard that your new company
has tried it and has claimed that for Hebrew the performance hit is
unacceptable. I am still sceptical of this claim. Presumably this was
done by adding a reordering step to an existing rendering engine. But
was this reordering properly optimised in binary code, or was it just
bolted on to an unsuitable architecture using a high level language
designed for the different purpose of glyph level reordering?

Also, as I just pointed out in a separate posting, there should be no
performance hit for unpointed modern Hebrew as there are no combining
marks to be reordered. The relatively few users of pointed Hebrew would
prefer to see their text rendered correctly if a little slowly rather
than quickly but incorrectly.

If, as you agree in principle, this is an issue which goes to the core
of Unicode, should you not be prepared to take some small performance
hit in order to conform properly to the architecture?

> ...
>
>If what is normalized is the backing store. If what is normalized is a
>string at an intermediate stage in the rendering process, then this is
>not the case. The reason is the number of times text-rendering APIs get
>called. ...
>
If it is unavoidable to call the same routine (for sorting or any other
purpose) multiple times with the same data, the results can be cached so
that they do not have to be recalculated each time.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: Michael Everson: "Re: Beyond 17 planes, was: Java char and Unicode 3.0+"
Previous message: Philippe Verdy: "Re: Beyond 17 planes, was: Java char and Unicode 3.0+"
In reply to: Peter Constable: "RE: Canonical equivalence in rendering: mandatory or recommended?"
Next in thread: Peter Constable: "RE: Canonical equivalence in rendering: mandatory or recommended?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST