Greek: Haralambous proposals

From: Nick Nicholas (
Date: Thu Jun 06 2002 - 11:59:46 EDT

Unicoders, some you may already be familiar with Yannis Haralambous' paper , given at Dublin.
The issues of precomposition have been hashed over again and again,
so I won't add to them; but I'd like to register some concern (and
some agreement) with other bits of it. In the following, of course, I
speak for myself, dude with goatee in Australia --- not for any of
employers past or present, or for the Consortium.

The first two 'rules' Haralambous are peculiar, because they are
already taken care of by normalisation. Furthermore, while users can
and should be steered away from certain diacritic combinations,
Unicode can't be the body saying what characters you can combine with
what --- since you might not want to write coherent Greek at all (a
point Rick has made to inquiries from the Greek Unicode list, and Ken
(I think) to me on a previous occasion.) So it needs to be made clear
that these are rules for users using the characters to produce cogent
Greek --- rules which Unicode, as I understand, doesn't want to
police at a design level.

To reiterate stuff that's been said before:

1.1. The acute and the tonos are the same thing; this is known. They
weren't the same thing necessarily from 1982-1986, whence the
confusion at ELOT. But normalisation takes care of this well. (The
snarking at 2.1.4 is unnecessary: people with lots of knowledge of
Greek were using a tonos distinct from the acute in 1982 ---
including the notorious Prof. Kriaras --- and the confusion is
likelier to be heritage from ELOT than Unicode's fault. Admittedly,
the vertical dash that used to feature on the charts is not familiar
to me as having been used in 1982...)

1.2. An uppercase letter can carry accents without breathings in
older typographical traditions of Greek; but of course, those are
instances in the middle of a word, because all-caps words used to be
accented --- and their accents were above the letter, not to the
left. The accents to the left occur only in the initials of title
case words, and of course always require a breathing in polytonic.
What on earth the function of U+1FBA is (A with initial grave) is a
mystery for the ages; I assume ELOT just got confused, giving
pseudo-polytonic equivalents of capitals plus tonos. But of course,
those characters will now never go away. Hopefully anyone designing a
font to deal with 17th century Greek will realise that those
characters shouldn't be used for all-caps accents --- and the current
glyphs are certainly going to discourage anyone from making that

1.6. The shame with 1.6 (avoid spacing diacritics to emulate capital
letters with left diacritics) is that this is how every 8-bit Greek
font on earth has done those capital diacritics, so people just
blindly convert them across to Unicode like that. Rule 6 should be
shouted from the rooftops; and anyone working with Unicode Greek
should realistically expect that they will get texts with this misuse
of spacing diacritics (which includes just about every polytonic
Greek text online --- TLG texts excepted :-) .)

1.7. Using pre-combined characters rather than combining diacritics
is something I've been guilty of myself in designing a website using
Unicode Greek; but pre-combined chars are deprecated for good reason,
and this suggestion should be downgraded even further: it is
emphatically only an interim solution, until smart fonts (like Minion
Pro) are in widespread use, and should be avoided in any text to be
further processed electronically. Believe me, you don't want to write
a search engine to deal with pre-combined characters and still
allowing diacritic-insensitive searches...

1.8. Unfortunately yes, some people do confuse psili and apostrophe;
I've had to deal with this in legacy text myself.

1.9. Guillemets are standard typographical practice for quotations in
Greece --- but not at all for Ancient Greek, the quotations for which
tend to follow that of the publishing country. Though there is a
special place in Hell for people using single quotes in Greek (as
they are readily confused with psili and daseia), mandating
guillemets for an audience including Western classicists is

Section 2 contains polemic against monotonic. Haralambous is entitled
to his opinion; and you're entitled to mine, which is (a) good
riddance (and the arguments made for the polytonic are anything but
compelling); and (b) there is no way conceivable that polytonic Greek
will make a comeback, when noone under 30 has learned it outside of
Ancient Greek classes (and I say this as a 30 year old.) Polytonic
may be holding on in *some* book publishing, but the majority of
computer users will neither use it nor want to use it. To contend
that monotonic is a local phenomenon, and that the needs of 10,000
classical scholars in the West outweigh those of 10,000,000 Greek and
Cypriot nationals, is Canute-like. Unicode should indeed recognise
continuing contemporary use of polytonic; but that monotonic is
official and prevalent, and the priority for any implementers, is
beyond dispute.

2.2. On the mute iota: a reminder that the subscript/mute iota is
indeed mute now, but was not mute in Classical Greek (it started
dropping out in the 3rd century BC.) And in the inscriptions of the
time, of course, it wasn't subscript at all: scribes only started
indicating its muteness by subscripting with the invention of lower
case, which all subsequent Greek typography has followed. Mute iota
is fine as a Modern Greek name (though so is ypogegrameni!) --- but
I'm not convinced classicists will like it. In any case, Unicode now
conflates the subscript and the adscript in normalisation, so here
too no dire results can come about. Furthermore, Haralambous is
making the time-old glyph/character confusion: Unicode has an
adscript glyph in its code chart for capital subscripts, but noone is
forcing Haralambous to use that glyph if his typographical tradition
wants capital subscripts instead. So to say that AiDHS (small cap
subscript) and A|dhs (subscript instead of adscript) are "not
conformant to Unicode v3.2" is misleading.

2.3. It is true that the Greek circumflex looks like the combining
inverted breve rather than the combining (Roman) circumflex; but
surely the sensible solution, as already occurs with the treatment of
precombined characters, is to treat the tilde and the inverted macron
as glyph variants of the perispomeni, and to deprecate the characters
for tildes, inverted macrons, and Roman circumflexes as realisations
of the perispomeni. (I've yet to see a dialectologist employ a Roman
circumflex on a Greek vowel, but it's not beyond the bounds of
reason. Plenty of use of inverted macron-perispomenis on consonants
in dialectology, though, to indicate palatalisation.)

2.5. Haralambous is correct on the confusion: stigma is numeric,
digamma alphabetic. (The stigma is originally the uncial version of
the digamma --- but by the time people were using uncials, the
digamma was only used as a number. So the bifurcation between stigma
and digamma is exactly parallel to that between the Q-koppa and the
S-koppa, the latter form also being mediaeval.)

3.1. I would want evidence of use of the uppercase Kai symbol. I've
only ever seen it lowercase --- though I admit I've only ever seen it
in old-style shopfronts. But I have only ever seen it lowercase even
in all caps contexts. If the capital kai symbol represents
typographical practice of long ago, I'm inclined to think this is
better handled as a straight ligature for cased "Kai": old Greek
typography didn't exactly skimp on ligatures, and Unicode doesn't
need to know about them. Unless I see compelling evidence to the
contrary, I don't think a capital Kai warrants inclusion in Unicode
any more than the "esti" ligature. Symmetry with the lowercase kai
ligature is not enough of a rationale for its inclusion.

3.1. On the other hand, capital Lunate sigma is long overdue, and
I've never understood why it was omitted: the papyrologists that use
it use case as much as anyone.

3.2. The reversed iota and upsilon are also still in use in Greek
dialectology. They are of course the homebrew version of the Jod,
though maintaining the distinctions between their etymological
origins (upsilon, iota.)

It's an admirable quirk of Greek typography that people flipped iota
circumflex and upsilon circumflex to get these jods; but I don't
think Unicode should go down this path with discrete characters.
You'll notice Haralambous' capital versions don't have tildes
underneath, but breves. In fact, dialectologists in particular
indicate jods for any combinations of characters pronounced as /i/ in
the modern language; so you will see often enough eta with a
combining breve underneath, or epsilon and iota with a combining tie
(e.g. skol(ei)o = skoljo). Obviously eta breve should decompose the
same way as iota breve, or the text becomes intractable; so I believe
the correct solution for these is to encode these ersatz jods at the
character level as letter + combining breve underneath (or tie), with
the upside down tildes treated as glyph-variant ligatures (iota +
breve underneath rendered through ligature as upside down

3.3, 3.4. This proposal has been rejected too often for me to repeat
why. :-) What software and OS designers choose to implement or not as
available combinations need not be any concern of Unicode's. If you
need circumflexes on epsilons, or smooth breathings on cap upsilons,
talk to Adobe, not Unicode. The rest of the world does not want yet
more precomposed forms to normalise; and I'm surprised at
Haralambous' insistence on this old ground.

[][][][]                   [][][][][][][][][][]                [][][][]
Dr Nick Nicholas.
                   University of Melbourne:
     Chiastaxo dhe to giegnissa, i dhedhato potemu,
     ma ena chieri aftumeno ecratu, chisvissemu.    (I Thisia tu Avraam)

This archive was generated by hypermail 2.1.2 : Thu Jun 06 2002 - 10:26:35 EDT