Re: polytonic Greek: diacritics above long vowels ᾱ, ‘, ῡ

From: Philippe Verdy <>
Date: Mon, 5 Aug 2013 03:09:37 +0200

2013/8/4 Richard Wordingham <>

> On Sun, 4 Aug 2013 22:32:38 +0200
> Philippe Verdy <> wrote:
> > 2013/8/4 Richard Wordingham <>
> > > Also missing are precomposed forms for the likes of <OMICRON,
> > > COMBINING DOUBLE BREVE, UPSILON>, described as a final diphthong
> > > shortened before a following vowel.
> > They are not missing, they are encoded just the way you write it.
> They are missing from the set of precomposed characters. (It's been
> argued that, in an ideal world, *all* precomposed characters would be
> missing.)

Of course you could argue that, but then the number of characters to encode
would have been tremendous, and we would have not been able to benefit from
the normalization stability. You ould also argue that normalization
stabiity was not needed for this case, but then it would have been
extremely difficult to define conformant processes (i.e. the assertion that
applications are trating all canonical equivalents the same way, except for
binary sorting those subsets of canonical equivalents).

> > They are not needed in fact, but they just should be documented
> > somewhere for implementers of renderers and fonts, to support these
> > types of clusters.
> Assuming that fonts containing COMBINING DOUBLE BREVE are not required
> or morally obliged to support it properly.

I have not said this was required. That's why I suggested NOT a normative
addition in TUS, but an evolutive, informative technical report instead.

> May be it will be enough to include them somewhere in CLDR data
> > (notably if they are still not listed explicitly in the Greek
> > collation table),
> The CLDR does not yet support Ancient Greek! It's by no means certain
> that COMBINING DOUBLE BREVE would make it to the list of auxiliary
> exemplar characters. Vowels with plain COMBINING BREVE and COMBINING
> MACRON don't make to the list of auxiliary exemplar characters for
> Modern Greek.

I was not speaking about exemplar character subsets for any language, or
even their auxiliary subset. Even if the last one is not standardized and
evolutive, it is based on frequency of use and someagreements that these
characters are desirable under common conventions, and that theiruse will
be understood with minor efforts.

> or in an informative technical report for the Greek
> > script, enumerating more completely the clusters that should be
> > supported and listing some known practices and recommanded encodings
> > (possibly with exceptions for some usages discussed in the report).
> > I suggest an informative technical report instead of extending The
> > Unicode standard itself, only because it will not be normative, and
> > will be subject to updates. And the same could be developed for other
> > scripts as well (notably for Semitic and Indic scripts).
> On the contrary, a simple remark in TUS Section 7.9 (precise location is
> an exercise for editors who like to make it difficult to cite) that
> diacritics over two base characters are not limited to the Latin
> script should suffice. It's covered by 'pronunciation systems' in TUS
> 6.2 - they're not limited to the Latin script. I did notice some cases
> of ties apparently being used in annotated Greek to indicate that a
> sequence of consonants counted as a single character for metrical
> purposes.

Where did I write that it should be limited to Latin ? I even spoke about
other scripts as well, and spoke about the common and inherited characters
that should be treated in the technical report for each script (I did not
suggest creating a separate report for each language, there are too many of
them, most of them having no defined stable orthographies, so that each
author will use his own conventions when transcripting these languages (but
they will still use a script accroding to common practices found in other
languages using the same script).

In my opinion, these informative technical reports should be a repository
of documents, accessible from a hierarchical directory structure, possibly
with links, i.e. in a HTTP-like store, with one folder per base script. We
could navigate between these documents and a possible solution would be to
use a categorized wiki, or a searchable blog with keywords and formatting
templates to organize the content (it could also use the Wikidata extension
of MediaWiki to help organize things and keep the accumulated data parsable
to derive many subsets of related data, about scripts, languages, usages,
occurences, disagreements found between various sources, and historic
evolutions of best or most frequent practices.

You won't have neough flexiblity with the existing CLDR examplar subsets
per language (and CLDR does not focus on non-linguistic uses such as
technical and epigraphic notations, or phonetic/phonologic notations, or
specific uses in multilingual texts, including texts that represent
simultaneously several languages or optional readings).

Such indexed repository of information would then be useful to
implementers, which could also exchange the results of their experience
(and also their past limitations when some paliative notations were used
even if they were ambiguous : we should be able to track the ambiguities of
analysis of existing texts, also because orthographies do not evolve at the
same time as languages, but writing systems are still evolving over time).
Received on Sun Aug 04 2013 - 20:13:17 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 04 2013 - 20:13:17 CDT