Re: polytonic Greek: diacritics above long vowels á¾±, á¿‘, á¿¡ from Philippe Verdy on 2013-08-06 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 7 Aug 2013 04:14:19 +0200

2013/8/7 Richard Wordingham <richard.wordingham_at_ntlworld.com>

> On Wed, 7 Aug 2013 01:42:06 +0200
> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>
> > 2013/8/6 Richard Wordingham <richard.wordingham_at_ntlworld.com>
> >
> > > For example, I think the proper
> > > upper-casing of <U+1FB3 GREEK SMALL LETTER ALPHA WITH
> > > YPOGEGRAMMENI, U+0359 COMBINING ASTERISK BELOW> is <U+0391 GREEK
> > > CAPITAL LETTER ALPHA, U+0359, U+0196 LATIN CAPITAL LETTER IOTA,
> > > U+0359>.
> > >
> >
> > Why do you use U+0196 LATIN CAPITAL LETTER IOTA instead of U+399 GREEK
> > CAPITAL IOTA ???
>
> That's a mistake. Sorry.
>
> > I'm also not convinced that duplicating the combining asterisk below
> > is correct here. My opinion is that it should be:
> > <U+0391 GREEK CAPITAL LETTER ALPHA, DOUBLE COMBINING ASTERISK BELOW,
> > U+0399 GREEK CAPITAL LETTER IOTA>
> > with a new "double" diacritic encoded between both letters (it will be
> > shown as a single asterisk, centered below the gap between the two
> > capital letters...
>
> The asterisk below indicates that someone once read the letter above,
> but it can no longer be verified, e.g. because of further deterioration
> of the manuscript. If one converts the text to capitals, the asterisk
> below would indicate that the letters cannot be vouched for by the
> publisher of the new text, and it makes sense for each unverified
> letter to have its own asterisk.
>

Or to place the asterisk in the middle gap between the two letters (this
preserves the fact that the non capitalized letter was read as a single
grapheme. Anyway the capitalization transforms the original text, so you
continue to loose semantic information. Why will you still want two
asterisks and not three to mark the suppression of lettercase ?

If the combining small iota subscript was capitalized as a combining
smallcap iota, there would be no ambiguity, both would be interpreted as
YPOGEGRAMMENY. Using the standard CAPITAL IOTA is just a loss of
distinction and a facility used by legacy character sets which were much
more limited.

> There's no such "double combining asterisk" character in the UCS. But
> > if you replace the asterisk by a macron (below or above) there exists
> > such double diaritic. The problem is that collation with strength
> > ignoring case diferences will not compare these strings as equal.
>
> > Or it could also be:
> > <U+0391, WJ, U+359, U+0399>
> > using a zero-width word joiner to hold the simple combining asterisk
> > below (this will create three grapheme clusters, with the second one
> > kerned below the two surrounding letters).
>
> That's not what U+2060 WORD JOINER does. It tells the word breaking
> algorithm that is being applied (presumably to scripta continua here)
> that there is no word boundary between the two letters. I don't
> believe that there is a character that does what you want.
>

I used Word Joiner because it has been suggested as a replacement for the
zero-width non-breaking space (U+FEFF ZWNBSP) which is now used almost only
as a leading BOM, and silently discarded from input streams when it is not
known if it is a begining of the stream (e.g. when the stream as no defined
being but is a live stream, which could combine multiple source streams).

And yes what I want here is *also* (not not only) to avoid a word break
between alpha and iota (there was no break between alpha and its
subscripted iota as they were in the SAME default grapheme cluster, this is
not the case if you use come non-breaking control between these two vowels
(Greek allows a syllable break here in many words, you would need some
dictionary lookup to determine that this was effectively an unbreakable
ypogegrammeni).

The discussion about kerning is appropriate because this is the intended
behavior of the word joiner to remain invisible (not increasing the
interletter spacing except in cases where there's no other solution to
avoid collisions of glyphs). Clearly, your asterisk below does not take any
space below Greek capitals, for usual (non decorated) font styles.

If decorated font styles are used, for example in a "lettrine" (dropped
initial capital letter) at the start of the first paragraph of a chapter
body, the pair of capital letters ALPHA+IOTA should still use a single
decorated glyph, and the asterisk should find its place below the glyph or
within some blank space left by decoration strokes of the lettrine. The
IOTA should not be left out of the lettrine only on the first normal line
when ALPHA alone is in the initial lettrine spanning two lines or more.

In such situation this is where you'll see that the two letters are linked
and form an unbreakable grapheme cluster. But may be you think that this
should use a control dedicated to indicating an implicit ligature (even if
it's not visible with Greek capitals with common fontstyles, as capitals
usually reproduce the engraved monumental style, with basic strokes and
minimal decorations).

What is the alternative:
- CGJ after ALPHA ? I'm not sure that there's no break after CGJ and before
another non-combining letter like IOTA.
- ZWNBSP ? Not very safe with many algorithms that will drop them silently
because they can't know if a leading BOM was already removed in the stream
or if the stream comes from the concatenation of two streams.
- ZWJ ? it is used to trongly suggest a ligature, but the ligature will
only be appropriate for lettrines but not on most paragraphs

> > I think this solution is preferable because collation with strength
> > ignoring case diferences (and treating WJ as ignorable) will compare
> > the uppercased string as equal to the original lowercase string.
>
> Alternatively, give U+0359 COMBINING ASTERISK BELOW only tertiary
> weight. It doesn't seem right to give it priority over accent
> differences.
>
Received on Tue Aug 06 2013 - 21:22:32 CDT

This archive was generated by hypermail 2.2.0 : Tue Aug 06 2013 - 21:22:39 CDT