Re: Canonical equivalence in rendering: mandatory or recommended?

From: Peter Kirk (peterkirk@qaya.org)
Date: Wed Oct 15 2003 - 14:44:20 CST


On 15/10/2003 10:48, Asmus Freytag wrote:

> I'm going to answer some of Peter's points, leaving aside the
> interesting digressions into Java subclassing etc. that have developed
> later in the discussion.

Thank you, Asmus. If people want to discuss normalisation and string
handling in Java, they are welcome to do so, but they should use a
different subject heading and not my (copyrighted :-) ) text.

>
> At 04:19 AM 10/15/03 -0700, Peter Kirk wrote:
>
>> I note the following text from section 5.13, p.127, of the Unicode
>> standard v.4:
>>
>>> Canonical equivalence must be taken into account in rendering
>>> multiple accents, so that any two canonically equivalent sequences
>>> display as the same.
>>
>
> This statement goes to the core of Unicode. If it is followed, it
> guarantees that normalizing a string does not change its appearance
> (and therefore it remains the 'same' string as far as the user is
> concerned.)
>
> ...
>
> The guidelines are concerned with the average case: displaying the
> characters as *text*.
>
> [The use of the word 'must' in a guideline is always awkward, since
> that word has such a strong meaning in the normative part of the
> standard.]

So, are you saying that for normal display of characters as text these
guidelines must be followed?

>
>>> Rendering systems should handle any of the canonically equivalent
>>> orders of combining
>>> marks. This is not a performance issue: The amount of time necessary
>>> to reorder combining
>>> marks is insignificant compared to the time necessary to carry out
>>> other work required
>>> for rendering.
>>
>
> The interesting digressions on string libraries aside, the statement
> made here is in the context of the tasks needed for rendering. If you
> take a rendering library and add a normalization pass on the front of
> it, you'll be hard-pressed to notice a difference in performance,
> especially for any complex scripts.
>
> So we conclude: "rendering any string as if it was normalized" is
> *not* a performance issue.

Thank you. This is the clarification I was looking for, and confirms my
own suspicions. But are there any other views on this? I have heard
them from implementers of rendering systems. But I have wondered if this
is because of their reluctance to do the extra work required to conform
to this requirement.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST