Re: polytonic Greek: diacritics above long vowels á¾±, á¿‘, á¿¡

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Tue, 6 Aug 2013 19:27:56 +0200

There are two types of conforming processes :

- those that produce a rendering will just have to give a result that may
be slightly different but still obeying to the character identities ;
quality of rendering is not an conformance issue if we still read the
result as being an epsilon with tonos, even if the exact placement of the
tonos is modified a bit (or even if the tonos partly collides visually with
the epsilon, when it should not and does not when rendering the canonically
equivalent precomposed character).

- those that produce textual or numeric data from a source text, should
return the same result (or a canonically equivalent result, if this result
is textual). If the process says that its result will be normalized (NFC,
or NFD, or NFKC or NFKD) then the textual result should be binary identical
(same number of code points, same sequence code code point values) and
conforming to the standard normalization. If the result is just using a
"fast" algorithm, they may be binary different but they should still be
canonically equivalent.

But there's an admitted exception : sorting with UCA may change the
relative order between the source strings, simply because sort stability is
not always wanted (it has a cost), and binary sorting the results using the
code point values as an additional collation level is not always wanted,
and normalization remains optional in UCA. The result is not strictly
canonically equivalent, because items in the sorted list may be in
different order, but still the comparable items should be canonically
equivalent.

But more importantly, the main different will occur when you use regular
expressions to match partial clusters (therere are known difficulties, for
example if you search for a combining accent, about which part of the
source content to return in the match, or if a letter precombined with that
accent should match or not).

2013/8/6 Jukka K. Korpela <jkorpela_at_cs.tut.fi>

> 2013-08-05 23:46, Richard Wordingham wrote:
>
> The requirement is that conformant processes not think they are doing
>> the right thing by treating canonically equivalent strings
>> differently. If there is latitude in a process, e.g. rendering, I
>> can't find a requirement to treat canonically equivalent strings
>> identically. Can you?
>>
>
> The first sentence is somewhat difficult to understand. I suppose the key
> is the word "the" vs. "a" in "the right thing".
>
> As far as I can see, the standard allows canonically equivalent strings to
> be handled differently, but it says that software should not expect other
> software to do so.
>
> In particular, in rendering, a program might display U+03B5 GREEK SMALL
> LETTER EPSILON U+0384 GREEK TONOS by drawing å and placing ´ over it, but
> U+03AD GREEK SMALL LETTER EPSILON WITH TONOS by simply using a glyph for it
> in the font being used. This might be regarded as being of inferior
> quality, but hardly as non-conforming.
>
> Yucca
>
>
>
>
>
Received on Tue Aug 06 2013 - 12:32:43 CDT

This archive was generated by hypermail 2.2.0 : Tue Aug 06 2013 - 12:32:44 CDT